Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Acoustic and perceptual cues to gender identification : a study of transsexual voice and speech characteristics Wollitzer, Lisa Candice 1994

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


831-ubc_1994-0658.pdf [ 2.94MB ]
JSON: 831-1.0087554.json
JSON-LD: 831-1.0087554-ld.json
RDF/XML (Pretty): 831-1.0087554-rdf.xml
RDF/JSON: 831-1.0087554-rdf.json
Turtle: 831-1.0087554-turtle.txt
N-Triples: 831-1.0087554-rdf-ntriples.txt
Original Record: 831-1.0087554-source.json
Full Text

Full Text

ACOUSTIC AND PERCEPTUAL CUESTO GENDER IDENTIFICATION:A STUDY OFTRANSSEXUAL VOICE AND SPEECH CHARACTERISTICSLISA CANDICE WOLLITZERB. A., The University of British Columbia, 1990A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTERS OF SCIENCEinTHE FACULTY OF GRADUATE STUDIES(School of Audiology and Speech Sciences)We accept this thesis as conformingto the required standardTHE UNIVERSITY OF BRITISH COLUMBIAOctober 1994© Lisa Candice Wollitzer, 1994Signature(s) removed to protect privacySignature(s) removed to protect privacyIn presenting this thesis in partial fulfilment of the requirements for an advanceddegree at the University of British Columbia, I agree that the Library shall make itfreely available for reference and study. I further agree that permission for extensivecopying of this thesis for scholarly purposes may be granted by the head of mydepartment or by his or her representatives, It is understood that copying orpublication of this thesis for financial gain shall not be allowed without my writtenpermission.(Signature)Department of IThe University of British ColumbiaVancouver, CanadaDate çJ(c- j /DE-6 (2188)Signature(s) removed to protect privacyABSTRACTFew studies address the assessment and management of the voice and speechpatterns of Male-to-Female Transsexuals (MTS). The clinician working withthis population may therefore be faced with considerable difficulty inplanning valid and effective intervention programmes. The purpose of thisstudy was to investigate acoustic and perceptual-acoustic aspects of MTS voiceto suggest the most effective and relevant therapy activities for this clinicalpopulation. Of particular interest was the anatomical-acoustic mismatchinherent in MTS speech. A further motivation was to investigate therelationship between salient acoustic variables associated with genderidentification and their perceptual correlates. Where such relationships canbe demonstrated, clinical practice may be simplified by targeting only the mostsalient features.Subjects were eleven Male-to-Female Transsexuals. Acoustic samples froman equal number of each of anatomical male and female speakers served as areference. Multidimensional correlates characterizing female voice as distinctfrom male voice were identified. Social validation was used to determinehow “successful” MTS speakers were in achieving a female voice. Resultsfrom the anatomical female and male speakers in combination with theinformation regarding the success of the MTS speakers were then used tosuggest the most salient cues to feminine voice.11Results suggest the FO average is the dominant characteristic indistinguishing male from female voice. Other distinguishing featuresinclude signal-to-noise ratio, Fl and the perceptual judgement of larynxheight. Characteristics separating MTS speakers from both anatomical groupswere the extent of FO shift relative to habitual speaker FO and antero-posteriortongue posture. Moderate to strong relationships for several pairs of acousticand perceptual variables were found. The data suggest that speaker FO andpitch are not heavily constrained by physical characteristics of a speaker. ForMTS speakers, this means that the individual’s ability to achieve a morefeminine voice may not be restricted by an anatomically male vocal tract.Results of correlational analyses suggest that several acoustic variables mayhave influenced naive listener judgements of masculinity-femininity.111TABLE OF CONTENTSABSTRACT iiTABLE OF CONTENTS ivLIST OF TABLES viiLIST OF FIGURES ixACKNOWLEDGEMENTS xCHAPTER I INTRODUCTION 1Purpose 3Rationale and Literature Review 6Male and Female VoiceCharacteristics 6Speech and Voice Characteristicsof Male-to-FemaleTranssexuals 22Methodological Issues 27Hypotheses 33CHAPTER 2 METHODS 35Subject Selection 35Experimental Task Constructionand Data Collection 37Analyses of Voice Samples 41ivCHAPTER 3 RESULTS 50Overview 50Preliminary Analyses 51Stepwise Discriminant Analyses 60Correlational Analyses ofPerceptual and Acoustic Variables 69Correlations Between PhysicalMeasures and Acoustic andPerceptual Data 74Analyses of Naive ListenerJudgements for the ThreeSpeaker Groups 75Summary of Major Results 79CHAPTER 4 DISCUSSION 82Overview 82Acoustic and Perceptual Cues toGender Identification 83Acoustic and Perceptual Correlates 98Physical and Acoustic/PerceptualCorrelations 100The “Success” of MTS Speakers andits Relation to Voice/Speech Features 103Methodological Considerations 109Directions for Future Research 112Conclusion 116VREFERENCES 119APPENDICES 1261 Demographic Data for Subjects 1272 Acoustic Data 1283 Perceptual Rating Scale #1 1294 Perceptual Rating Scale #2 1305 Modified VPA Protocol used byTrained Listeners 1316 Perceptual Data 1327 Pearson Product-Moment Correlations byVariable 1338 Percentage of Interrater Agreementby Variable 1359 The Gender Categorization of MTSSpeakers based on the DiscriminantAnalyses 13710 Averaged Naive Listener Ratings forMasculinity/Femininity 13811 Figure A: Histogram of AnatomicalMale and Female Speaker FOFigure B: Histogram of AnatomicalFemale, Male and MTS Speaker FO 139viLIST OF TABLESTABLE PAGE1 Reliability Scores for AcousticMeasures 522 Intrarater Reliability forMasculinity/Femininity Ratings byNaive Listeners 533 Percentage Interrater Agreement forBipolar Judgements of MTS Voices byNaive Listeners 544 Correlation Matrix of Interrater Reliabilityfor Masculinity/Femininity Judgementsby Naive Listeners 555 Intrarater Reliability for Judgementsby Trained Listeners 566 Summary of Overall Interrater Reliability:Correlations and Percentage of Agreementby Variable 577 Significant Univariate F Values providedin Stepwise Discriminant Analysis #1 628 Summary Table of Results for StepwiseDiscriminant Analysis 689 Key to Conversion of Layer Scale Scoresfor the Purpose of Analysis 6910 Summary of Correlational Analysis ofAcoustic and Perceptual Variables 7411 Spearman Correlations betweenPhysical Features and Perceptual andAcoustic Variables 75vii12 Spearman Correlations betweenExperimental Variables and Naive ListenerJudgements of Masculinity/Femininity 7713 Comparison of the Gender Categorizationof MTS Speakers based on theDiscriminant Analyses and NaiveListener Judgements 105viiiLIST OF FIGURESFIGURE PAGEI Histogram of Discriminant Scores: TwoGroups with FO included 632 Histogram of Discriminant Scores: TwoGroups with FO excluded 663 Relationship between Acoustic FO Rangeand Pitch Range Judgements 704 Relationship between Fl and TonguePosition (raised/lowered) 715 Relationship between Rate of FO Shiftand Pitch Variability Judgements 726 Relationship between Averaged Extentof FO Shift and Pitch Variability Judgements 737 Success of MTS Voice Change based onNaive Listener Judgements 768 Scatterplot of the Relationship betweenFO Average and Masculinity/FemininityJudgements 789 Relationship between Listener Judgementsof Masculinity/Femininity and FOAverage by Group 108ixACKNOWLEDGEMENTSI wish to thank all those who generously donated their time and energy tomaking this project possible.I am deeply grateful to my thesis supervisor, Dr. Linda Rammage, who sharedher experience and expertise and provided me with support andencouragement throughout.Thank you to my voice subjects, raters, trained listeners, classmates andcolleagues. Your contribution to this work is invaluable. To my friend andreliability soul-mate, Daiana D’Orazio, we made it!To my supervising committee members, Dr. Judith Johnston and Dr. MurrayMorrison, thank you for your advice, suggestions and careful editing. Specialthanks to Judith, who has been my inspiration for the past three years.And to my friends and family, especially my parents, whose continuing love,patience and support has made it possible for me to achieve my goals, I amforever grateful.I dedicate this work to my husband, Scott.xCHAPTER IINTRODUCI1ON:In the history of the study of human voice characteristics, few studies haveattempted to determine how male and female voice differ or to describefemale voices. There is still relatively little data on the vocal characteristics offemales as initial research in this area consistently used male subjects. Theapparent assumption was that the information derived from males wouldtranslate directly to females, the primary difference between the two gendersbeing fundamental frequency (FO). Subsequent study has indicated that therelationship between males and females in terms of voice and speechcharacteristics is not that obvious. Early attempts to synthesize female voiceinvolved increasing fundamental and formant frequencies, butunconvincing results suggested this approach was too simplistic (Titze, 1989;Kent & Read, 1992). More recent studies have suggested that there are manyways in which male and female voice differ and that further research isrequired to understand female voice more fully. The database of femalevoice characteristics has grown steadily; however, further study is required toadd to our understanding of female voice and speech.The Male-to-Female Transsexual (MTS) population is one for which theseissues are very important. Transsexualism is described by Oates and Dacakis(1983) as a problem of gender identity where a conflict exists betweenpsychological gender and anatomical gender. These individuals wish to live1as members of the opposite sex and will alter their appearance, manner andmode of dress to this end. Transsexuals will often seek out medicalintervention for surgical gender reassignment. Before this procedure isperformed, surgical candidates are typically placed on a regime of hormonaltherapies to begin the change of secondary sex characteristics. Research hasindicated that, unlike Female-to-Male Transsexuals, Male-to-FemaleTranssexuals will not experience significant changes in their voices as a resultof hormone therapy (Oates and Dacakis, 1983; Money and Walker, 1977).Therefore, in order to achieve a more female sounding voice many MTSsmake adjustments to their voice quality and vocal tract configuration and feelsatisfied with the result. Those who have more difficulty may opt for surgicalalteration. Most commonly in this procedure, the vocal folds are tightenedthus changing the vibratory characteristics of the glottal source and increasingfundamental frequency. Others will seek out the help of a speech-languagepathologist. In the past, therapy has focused primarily on raising the FO(Kaira, 1977). However, this may not in all cases achieve a voice that isperceived as feminine. For example, MTSs will sometimes complain thatthey are still mistaken for men over the telephone and in other situationswhere the listener is unable to see the speaker. For these individuals, itwould be desirable to implement some other form of therapy to help themattain their goal.It has been observed anecdotally that certain MTh speakers are able to achievea more “successful” female voice than others, success being measured interms of how commonly the voice was assumed to be produced by an2anatomical female speaker. How do these individuals, who are assumed tobe constrained by anatomically male laryngeal and vocal tract structures,achieve a successfully female sounding voice? It is well documented that thehealthy human vocal system is capable of producing an enormous range ofsounds varying along numerous parameters induding loudness, pitch,quality, and register. Therefore, anatomy need not necessarily be a limitingfactor on an individual’s ability to change his or her voice. MTS speakerswho are capable of producing successful female voices may reflect thisintrinsic versatility in their vocal system.There are a number of features that distinguish female from male voice.Furthermore, the existence of a relative hierarchy of importance has beenproposed (Oates and Dacakis, 1983). The literature suggests that fundamentalfrequency and its perceptual correlate, pitch, are the most importantindicators of speaker gender (Coleman, 1971; Lass, Hughes, Bowyer, Watersand Bourne, 1976). However, fundamental frequency is not a sufficientindicator in all cases. Many MTS speakers who are using fundamentalfrequencies well within the female range nevertheless fail to be perceived asfemale. Clearly, there are other features that are influencing listeners. Thecandidate voice characteristics that have been chosen to test this hypothesiswere selected based on results of previous investigations in this area.PURPOSEThe purpose of this study was to investigate acoustic and perceptual-acousticaspects of Male-to-Female Transsexual speech. The primary motivation for3such a study is to determine the most effective and relevant therapy activitiesfor this clinical population. Of particular interest was the anatomical-acoustic mismatch inherent in MTS speech. That is, how is it possible for aspeaker with an anatomically male vocal tract and laryngeal structures tosuccessfully achieve a female voice? In order to address this question, it wasnecessary to better understand gender markers in speech and the importanceof physiological constraints on a speaker. Objectives, outlined below, were setto help interpret the MTS data.Acoustic samples from anatomical male and female speakers served as areference against which to compare MTS speakers. Analyses were performedin an attempt to define multidimensional correlates which characterizefemale voice as distinct from male voice. In addition, MTS samples wererated in terms of how “successful” the individual was in achieving a femalevoice on the basis of social validation. Results from the anatomical femaleand male speakers in combination with the information regarding the successof the MTS speakers were then used to suggest the most salient cues tofeminine voice and therefore those characteristics that are of greatestimportance in clinical practice.A further motivation was to investigate the relationship between salientacoustic variables associated with gender identification and their perceptualcorrelates. Past research suggests that there are specific correlations betweenperceptual and acoustic variables, for example, a systematic relationshipbetween changes in frequency and listener judgements of perceived pitch has4been demonstrated (Borden and Harris, 1984). Due to the complexinteractions between acoustic and perceptual aspects of connected speech, thestrength of these relationships is often not revealed at a discourse level.However, where these relationships can be shown to be stable, redundancy inclinical practice may be reduced by targeting only the acoustic measures or theperceptual measures rather than both in the evaluation and treatment ofvoice.The study was designed to assist speech scientists, rehabilitation specialistsand other professionals interested in the nature and remediation of gender-based voice problems. Results from the present study provide informationwith regard to the specific acoustic and perceptual features which are likely tobe useful targets in therapy with individuals who are experiencing genderrelated dysphonias, in particular, Male-to-Female Transsexuals (MTSs). Inaddition, it will add to the existing database on speaker gender identificationand therefore it will provide information for those professionals in the fieldof voice synthesis attempting to more accurately replicate male and femalevoices. The results may also have implications for working with individualswho have undergone laryngectomies.5RATIONALE AND LITERATURE REVIEW:Male and Female Voice CharacteristicsGender markers vs. gender stereotypes:Sociolinguistics is a field that has directed considerable attention to the studyof gender specific characteristics of speech and language use. A number ofresearchers from this background have proposed speech stereotypes to exploregender-related speech and language differences. They have shown that thereare perceived differences in the speech and language used by men andwomen and that the belief system which appears to govern people’sjudgements of gender appropriateness is acquired in childhood (Siegler andSiegler, 1976; Edeisky, 1976b; Haas, 1979; Tannen, 1990). Research undertakenin the 1970’s suggested that women’s speech was commonly believed to becharacterized by more euphemisms, politeness forms, and apology; femalemanner of interaction was felt to be nonassertive, tentative and supportive.Male speakers were thought to use more slang and profanity; stereotypicalmale interaction styles include lecturing, arguing, debating, asserting andcommanding (Haas, 1979). In addition, it was felt that women spoke ondifferent topics than men, for example, women were thought to talk moreabout the home and relationships, while men were more likely to talk aboutsports, money and business.Some investigators attempted to find empirical evidence that suchstereotypes actually existed in the language used by the sexes. Aries (1976)found that in all-male group interactions conversational themes included6aggression, competition, victimization, and practical joking. All-femalegroups were more likely to share information about themselves, theirfeelings, their homes, and their interpersonal relationships. As Kramer(1977) observed “it appears that commonly held expectations females andmales have of their sex and of the other sex have an impact on their dailyinteractions with each other”.More recent research has supported these findings (Tannen, 1990). Tannenhas contributed extensively to the study of sociolinguistic differences thatexist between female and male speakers. Tannen (1990) explores theconversational asymmetries of male and female conversational styles and theconsequent misunderstandings that arise. She considers the ways in whichperceived differences compare with actual gender distinguishing features.Clearly, there are other factors interacting with speaker gender that will inpart determine the style of speech used, for example, age, culture,socioeconomic status, geographical region and so forth. For many of thecharacteristics perceived to be associated with a given gender, there is littleevidence to support their existence in actual language use. Haas states “thereis no evidence that any linguistic feature is used exclusively by one sex in oursociety; variations have been found only in the frequency of production”(Haas, 1979). Thus, the stereotyped understanding of male and female speakerdifferences does not allow us to accurately delineate the sexes. Althoughthese stereotypes are of interest from a psychosocial point of view, they are ofminimal diagnostic value. For the purposes of this investigation it is7necessary to separate characteristics which are popularly believed to be specificto one or the other gender from those which through scientific observationand study have been shown to distinguish female from male speakers. Thesecharacteristics have been referred to as “speech markers” (Smith, 1979).The distinction between speech markers and speech stereotypes is animportant one for the MTS population. Oates & Dacakis (1983) provide acomprehensive review of the current issues and directions in voice treatmentfor MTSs. According to these authors, “speech markers refer to empiricallydetermined linguistic features which differentiate male and female speech.Speech stereotypes denote features which, regardless of their linguistic reality,have become associated with and expected of men and women” (Oates andDacakis, 1983). Both types of speech features play a role in the way thespeech-language pathologist will proceed with treatment.Each treatment-seeking individual comes to the therapy programme withpersonal beliefs about male and female speech characteristics. These beliefswill influence the expectations that individuals have about their own voiceand treatment. This, in turn, will affect the client’s goals in therapy. Theassumptions MTS clients are making are likely to be reflected in the mannerin which they alter their voice, speech and language to sound female.Unfortunately, it is possible that not all their assumptions correspond withthe empirically determined facts about female voice. It is, therefore, of greatimportance to increase awareness regarding stereotypical features of female8speech and those features that have been demonstrated to distinguish femalefrom male voice.As reported by Smith (1979) , the following are areas of speech and languageuse for which empirical evidence exists to suggest gender differences:pronunciation, grammatical forms, vocabulary use, speech style, code useand dialect. These characteristics exist at a level which Smith refers to as the“segmental level”; that is, these features of speech are bound to themorphophonemic units of a language. In addition to differences at asegmental level, Smith discusses “nonsegmental” differences. These refer tocharacteristics not immediately defined by the meaningful units of speechincluding pitch, intonation and paralinguistic elements such as loudness,pausing and voice characteristics. Intuitively, nonsegmental, orsuprasegmental, speech characteristics would seem to be the primary cues tospeaker gender. Before the message and manner of communication havebeen received and analyzed, suprasegmental information has been processedand has already set the parameters for the interpretation of what has beensaid. Suprasegmental features are therefore arguably the most important cuesto speaker gender and they will be the focus of this study.It has been well documented in the literature that speaker fundamentalfrequency (FO) and its perceptual correlate, pitch, are the most importantsuprasegmental cues to gender identification (Coleman, 1976; Lass et al., 1976).However, even when FO falls in the ambiguous area on the border betweenthe masculine and the feminine ranges (i.e. 150 -160 Hz), listeners are able to9make a gender judgement (Spencer, 1988). Similarly, when the voice sourceis removed or controlled in some other way so that it is not available to thelisteners as a cue to gender identity, they are still able to judge the speaker asmale or female (Weinberg & Bennett, 1971; Coleman, 1973, 1976). Thesefindings suggest that acoustic features other than FO may be contributing tothe perception of speaker gender.Anatomical and Acoustic Correlates of Gender:The basic physiology of the vocal tract is one factor which will influence theF0 used by a speaker; that is, we know that male and female vocal tracts differwith respect to length, mass and configuration of the vocal folds andtherefore, are likely to produce correspondingly different acoustic outputs(Hollien, 1960a, 1960b; Titze, 1989 & 1994; Coleman, 1971; Layer, 1980).Researchers have found that there is a direct relationship between laryngealsize and pitch perception; that is, the larger the laryngeal structure the lowerthe pitch (Hollien, 1960a). Similarly, vocal fold thickness has beendemonstrated to be related to FO. Hollien (1962) found that as the FO ofphonation was raised, the mean thickness of the vocal folds wassystematically reduced. However, it is not clear how restrictive the influenceof physical factors is on speech production. Titze (1994) states, “The fact thatthe human vocal fold is made up of both of active and passive tissue resultsin great flexibility in FO control”.There have been a number of studies which have highlighted theinconsistencies between predicted acoustic measures and physical measures.10Although acoustic theory gives researchers a reference point from which toestimate various acoustic measures, these values are not always supported byactual results (Rammage, 1990). For example, acoustic theory predicts that theformant frequencies for a central vowel produced by a neutral vocal tractsetting would have the following values: Fl: 500; F2: 1500; F3: 2500; F4: 3500.It is reasonable to predict that by raising or lowering the larynx, effectivelyshortening or lengthening the vocal tract, one should see a systematic shift informant frequencies; however, this expected result has not been consistentlydemonstrated. It is not surprising that current research reveals there is nosimple correspondence between changes in laryngeal height and formantfrequencies. The vocal tract is a complex system of interconnectedmechanisms. In manipulating larynx height, changes will occur not only inthe location of the larynx but in the tension of the surrounding musculature,that in turn impacts on the degree and location of constriction within thevocal tract and possibly vocal source vibratory patterns (Layer, 1980;Sundberg and Nordstrom, 1976). Voice scientists continue to study the extentof restriction imposed by speaker anatomy and physiology. Research to datedemonstrates the inherent flexibility in the human vocal system.The independence of acoustic and physical measures has been observed byother researchers. In a study of physical and social correlates of speakingfundamental frequency, Graddol and Swann (1983) found, “the limits dictatedby physical characteristics of the larynx may not, in practice, be the main,determining factors of the fundamental frequency used in normal speaking”.They suggest that the human vocal tract is capable of producing an enormous11range of frequencies. However, they also point out that despite the fact that itis possible for speakers to produce a wide range, normal speaking is restrictedto a much narrower range of frequencies. Thus, although individualphysical structure may be the ultimate limiting factor in the total range of FO aspeaker is able to produce, there is considerable flexibility within this rangeand the range of FO actually used may be influenced by other factors such ascultural expectations and speaker position in society. This observation hasbeen supported by Linke (1973) who found that women in this study used alower median FO than men, consequently reducing the range of speakingfrequencies available to them. Linke cites societal pressures as the possibleexplanation for this tendency.Salient Acoustic and Perceptual Cues to Gender Identification:In addition to the physical constraints, there may be certain characteristics ofvocal fold vibration and other aspects of vocal tract resonance that couldcontribute to gender identity: voice quality features, such as harshness,breathiness; or resonance features such as formant frequencies. As well, theremay be sex-specific phonetic and prosodic effects that aid in listeneridentification, a possibility investigated by Ingrisano, Weismer and Schuckers(1980) in their study of preschool children’s voices. Past research in the areaof voice characteristics and gender identification has shown the importance ofthe vocal tract filter which appears to play a major role in allowing listenersto determine speaker gender (Coleman, 1976, Brown and Feinstein, 1977).When FO was neutralized so listeners did not have this cue to use, it was stillpossible for them to make a gender judgement. Clearly, there are other12aspects of voice and speech that can aid a listener in determining a speaker’sgender.Influence of the Vocal Tract Filter on Gender Identification:A number of studies have attempted to answer the question of whetherlisteners are able to judge speaker gender in the absence of the primary cue,fundamental frequency, and what the other significant cues might be.Researchers have hypothesized that when the information provided by theglottal source is controlled for, listeners are able to use vocal tract resonancecharacteristics to determine speaker gender (Coleman, 1971 & 1977; Brownand Feinstein, 1977; Sachs, 1975 ; Ingemann, 1968; Ingrisano et al., 1980; Lass,Hughes, Bowyer, Waters and Bourne,1976; and Weinberg and Bennett, 1971).Furthermore, studies by Peterson and Barney (1952) and Ladefoged andBroadbent (1957) have demonstrated that males have lower average vowelformant frequencies than females, i.e. that there were clear differences informant frequencies between the sexes.In order to evaluate the contribution of these potentially distinguishingfeatures, it was necessary for investigators to neutralize or isolate FO from thevoice samples to be analyzed. Brown and Feinstein (1977) obtained voicesamples of male and female speakers while using an electronic artificiallarynx with a constant FO in place of their normal voice. These samples werethen rated by untrained listeners. They hypothesized that “there are othersex-related differences in the supraglottal vocal tract which producediscriminable acoustic differences in the speech signal” (Brown and Feinstein,131977). Results showed that when the glottal source was controlled, listenerswere still able to identify speaker gender with 76% accuracy. These researchersinterpret their results as follows: “It would appear from the spectrum shifts(which are a result of the resonance characteristics of the vocal tract)associated with the male and female voice identifications that vocal tract size,especially length and shape, may play an important role in identification ofspeaker sex” (Brown and Feinstein, 1977). Although Brown and Feinsteinsuggest that their findings are related to physical parameters, they do notprovide any information about the size of their subjects. The gender-specificdifferences seen in their data could also be interpreted as being the result ofmuscular tension within the vocal tract or the relative positioning ofarticulators.Coleman (1971) also analyzed recordings of subjects using an electrolarynx setat 85 Hz. The listeners were asked to judge the sex of the speaker and to givethe confidence with which their decision was made on a 7 point scale. Thevowels were analyzed spectrographically and frequencies for formants 1, 2,and 3 were obtained. The overall vocal tract resonance characteristics of eachsubject were then compared to the degree of maleness/femaleness indicatedby the listeners. Coleman found that using an 85 Hz electrolarynx, it wassomewhat easier for the judges to identify the males. More females weremistaken as being male. Coleman offers this explanation, “In cases whereother differences were minimal, the low frequency of the electrolarynx mayhave given a male quality to the voice which was stronger than any otheracoustic cue to female voice quality” (Coleman, 1971). Coleman found a14statistically significant relationship between perception of male and femalevoice and formant frequencies. In addition, there was some overlap offormant frequencies for speakers who were correctly identified as male orfemale, i.e. some males had formant frequencies which fell within the femalerange and vice versa.Other researchers have attempted to control for FO by having subjects whisperor produce only voiceless consonants. Lass et al. (1976) investigated therelative importance of laryngeal FO and vocal tract resonance characteristics byasking listeners to judge the gender of speakers producing sustained vowelsunder three experimental conditions. Results showed that listeners were 75%accurate in making gender judgements in the absence of FO. These findingsdemonstrate the ability of listeners to make use of vocal tract resonanceinformation when they cannot use FO.An earlier study also considered the contribution of the vocal tract in genderidentification. Ingemann (1968) hypothesized that the accuracy with which alistener could identify speaker gender was dependent upon the portion of thevocal tract anterior to the place of constriction: the longer this portion is, themore easily gender may be determined. Ingemann asked phonetically trainedmen and women to produce nine different voiceless fricatives. Resultsshowed that [h] was the phoneme that best allowed the identification ofspeaker gender, while accurate gender identification using front fricatives waslittle better than chance. Ingemann’s data supported the initial hypothesisstated above, they offer further support for the theory that vocal tract15resonance can be used in the absence of FO to identify the gender of thespeaker.Another group of researchers have looked at how the removal of the glottalsource, as in the case of individuals who have undergone a laryngectomy,affected gender identification. Weinberg and Bennett (1971) asked listeners toidentify the sex of esophageal speakers. When the vocal source is removed, itis more difficult to achieve a voice within the normal female pitch range(Weinberg & Bennett, 1971). Nevertheless, they found that three out of fourfemale esophageal speakers were correctly identified as female. Still otherresearchers have addressed this issue by looking at the voices of children.Ingrisano et al. (1980) were interested in discovering whether listeners couldidentify the gender of 4 to 5 year old children. It is suggested in this study thatunlike that of adults, the anatomy of male and female children is notsignificantly different for children of similar height and weight. Thus, theanatomical differences assumed by Brown and Feinstein to dictate vocal tractresonance characteristics are controlled for. Given minimal anatomicaldifferences, one would expect that listeners would not be able to distinguishmale from female voices. However, these researchers found that listenerswere able to correctly choose the gender from the childrens’ voices with 70.7%accuracy.Formant values reflect the size, mass, and configuration of the vocal tract(Kent and Read, 1992). Researchers have shown that certain formantfrequencies are sensitive to articulatory postures (Borden and Harris,1984;16Nittrouer et al., 1990). For example, Fl is known to be most responsive tochanges in mouth opening; F2 is most responsive to changes in the oralcavity, i.e. tongue position (fronted vs. backed). These articulatory gesturesare presumably be reflected in the perceptual speech characteristics. Theacoustic research reviewed has shown that there are reliable gender-specificdifferences between the vowel formant values of female and male speakers.The analysis of vowel formants in this investigation is of interest because itmay show correlations with various perceptual features. For example, it ispossible that particular articulatory manipulations will be reflected in thepattern of resultant formant frequencies; fronted articulation or a raisedlarynx may be indicated by a systematic shift of formant frequencies.Evidence exists to suggest that listeners are able to judge whether a speaker isusing a fronted tongue position or a raised larynx (Layer, 1980). Relationshipsbetween perceptual and acoustic features will be investigated further in thepresent study.Features of the Laryngeal Source influencing Gender IdentificationSeveral studies have considered voice quality and laryngeal vibratorycharacteristics independent of fundamental frequency in an effort tounderstand female and male voice (Titze, 1989; Leddy, 1989; Sorensen &Horii, 1983; Price, 1989; Nittrouer, McGowan, Milenkovic and Beehler, 1990;Klatt & Klatt, 1990; Kent & Read, 1992). Based on these studies, the femalevoice is characterized by breathiness, higher FO, a different range of FO, higherformant frequencies and greater interaction between source and filter (Kentand Read, 1992).17Breathy voice quality has been stereotypically associated with female voice.However, empirical evidence exists to suggest that the feature may indeed bea gender marker. Several different acoustic measures have been related tobreathiness. Klatt and Klatt (1990), examined breathiness in two ways. Thefirst compared listener judgements of breathiness with acoustic measures toexamine relationships. These researchers found that females voices weretypically rated as breathier than male voices although there were largeindividual differences within genders. Of nine acoustic parametersexamined, only two were significantly correlated with the judgements ofbreathiness: the degree of aspiration noise present in the higher formants ofvowels and the relative strength of the fundamental component. In thesecond analysis, this information was used in conjunction with a formantsynthesizer in an attempt to imitate the normal voices. The synthesizedvoices were judged for various voice qualities. The authors suggested thatperceptually, breathiness is signalled by a large number of acoustic cues thatare associated with a posterior glottal opening, or “glottal chink” duringphonation. They found that by manipulating parameters that have beenshown to have an effect on voice quality, they were able to achieve a set ofvoice qualities ranging from laryngealized to normal to breathy. Results ofthis study suggest that one of the most important ways in which male andfemale voice differ is in the amount of aspiration noise.Nittrouer et al. (1990) demonstrated similar results indicating the importanceof certain vocal source vibratory characteristics differentiating the genders.18Unlike the present investigation, this study was designed to look at howlaryngeal activity relates to acoustic output. They analyzed voice samplesusing CSpeech to obtain jitter, shimmer and signal-to-noise ratio (SNR)values for male and female speakers. Again, the amount of aspiration noisein female voice was significantly greater than that in male voice. Theseauthors suggest that SNR is a possible indicator of breathiness since it isassumed to be associated with inefficiency of vocal fold vibration. SNR is theratio of total energy in the acoustic signal produced by the vocal folds toenergy in the noise component and as such, may vary with amount ofbreathiness in the voice. Results of this study provide further support for theimportance of breathiness and SNR in gender identification.Gender related differences in jitter and shimmer have been investigated inseveral studies, but results are ambiguous. Sorensen and Horii (1983) reportsignificant differences in both measures for sustained vowels produced bymale and female speakers. They found that female voice is characterized byless vocal shimmer and greater jitter than male voice. However, these resultsappear to contradict those found by Nittrouer et al. (1990). Their datasuggested that male jitter values were considerably higher than female andthey found no speaker-sex effects for shimmer. In addition, Nittrouer et al.(1990) found that both jitter and shimmer covaried with other acousticvariables. Jitter was found to be strongly correlated with with SNR in malevoice, while shimmer values covaried with FO for all speakers. Theseresults bring in to question the value of jitter and shimmer values asindependent indicators of voice quality features.19Another reported distinction between the genders is prosody or intonation.This suprasegmental feature is conveyed through modifications infundamental frequency, duration and intensity. The literature concerninggender-based difference in intonation patterns is difficult to interpret becauseof the nature of this speech characteristic. Intonation is as much tied toemotion, social status, and cultural experience as it is to gender. Many of thestudies demonstrating male and female differences in intonation patternsreflect biases of different cultures and generations making it difficult togeneralize results. Nevertheless, there is reason to believe that prosodicfeatures may provide important cues to gender identity. Brend (1972)reported gender specific intonation patterns for male and female midwesternAmerican speakers. The author suggested that unlike women, men wouldavoid using the highest level of their pitch range and that women used morecontrastive levels of intonation than men suggesting greater intonationalvariability. Pellowe and Jones (1978) also found that women demonstrated agreater variety of intonational patterns. There are obvious differences in theFO ranges and FO variability used by speakers in general and evidence exists tosuggest that these differences may demonstrate a gender effect.Terango (1966) investigated pitch and duration characteristics of male voiceon a masculinity-femininity dimension. This study considered themasculine-feminine characteristics of voice from a slightly differentperspective. Only male speech samples were used and analyzed for FO andintonational characteristics. The voices were rated perceptually along a20masculinity-femininity continuum and were then grouped into one of twocategories, “masculine” and “effeminate”. Results showed that two of themeasurements differentiated between the groups: mean speaker FO and themean rate of pitch change during inflections. The largest difference betweenthe two groups was for upward inflections. Apparently these characteristicsin the voice of anatomical male speakers had the effect of creating whatTerango refers to as an effeminate voice quality. This finding hasimplications for the treatment of MTS clients. Ideally treatment shouldeliminate effeminate voice quality replacing it with a more perceptuallynatural female voice. The issue of prosody gender differences and itsrelevance to the characterization of MTS voice will be discussed in Chapter 4.The Relationship between Acoustic and Perceptual VariablesWhenever we hear another person’s voice, we are able to classify that voiceby comparing it to our own understanding of human voice based on all thevoices we have heard over our lifetimes. In the majority of cases, it seemsobvious to us at once whether the voice has a high or low pitch, if the voicebelongs to a child or an adult, if it is a healthy voice or if it sounds like thespeaker has a sore throat and so on. It is not as clear exactly how we are ableto make these decisions. Given the diversity of human voice characteristics itis remarkable that we are able to judge voices as readily as we are. It appearsthat there must be certain salient perceptual cues that the listener may drawfrom a speaker’s voice which allow for particular judgements to be made.Professionally trained or not, listeners seem to have some sense of genderappropriate speech and voice characteristics. However, because there will be a21natural variability in expectations and perceptual definitions of voicecharacteristics between listeners and even in the same listener at differenttimes, it is desirable for the clinician working with pathological voiceconditions to be able to relate perceptual information to objective measures.There is an underlying assumption that acoustic features of voice/speech arerelated to perceptual voice/speech qualities. This being the case, it isreasonable to expect that specific modifications to the acoustic signal will havea direct influence on perception. In the majority of the above cited studiesthere has been an attempt to understand more about the nature of female andmale voice and speech by selecting specific objective measures and comparingthese systematically with listener judgements. Results have revealed thecomplex nature of the relationship that exists between acoustic and perceptualvariables. As reported above, recent studies have demonstrated relationshipsamong perceptions such as breathiness and select acoustic measures (Klatt &Klatt, 1990; Nittrouer et al., 1990; Wolfe & Ratusnik, 1988). For these reasons,the present study will include an investigation into how listener perceptionsmight be related to the acoustic parameters of interest.Speech and Voice Characteristics of Male-to-Female TranssexualsA review of previous research in MTS voice provides further direction formethodological issues and salient voice features to be considered in thepresent study.22Treatment Efficacy Studies of MTS Voice:There are relatively few studies that are specifically concerned with MTSvoice. Bralley, Bull, Harris, Gore and Edgerton (1978) and Kaira (1977) areauthors of two of the earlier studies in this field. Both studies examined theeffectiveness of voice therapy with a MTS client. Therapy consisted primarilyof increasing FO and the effective pitch range used. These treatments havedemonstrated a certain degree of success, but as Bralley et al. (1978) point out,other aspects need to be incorporated as a change in FO alone is not sufficientto indicate female gender.Experimental Studies of MTS Voice:Recent experimental investigations have been designed to obtain more datafor this population. Spencer (1988) provides an acoustic and perceptual studyof the voice characteristics of MTS speakers. The purpose of this work was todetermine the extent to which MTS speakers were able to adopt female speechcharacteristics. MTSs selected for the study were those who had madechanges to their speech patterns. Seven males and seven females werechosen as controls to serve as a standard against which to compare the MTSspeakers. Each subject was matched with a male and female control withrespect to height, weight and age. All subjects were recorded reading the firsttwo sentences of the Rainbow Passage. Naive listeners were asked to rate thevoices as male or female and then to indicate the degree to which thatparticular voice was representative of the sex chosen. An acoustic analysiswas performed to obtain the FO for each speaker for the voiced portions of thesamples.23With respect to the measure of F0 and its relation to perceived sex, Spencerfound that, typically, speakers using a fundamental frequency above 160 Hzwere perceived as female and those with an F0 below 160 Hz were perceivedas male. She also found a strong linear relationship between FO and genderrepresentativeness scores, “higher scale scores for femaleness were associatedwith higher vocal fundamental frequencies, and higher scale scores formaleness corresponded to lower vocal fundamental frequencies” (Spencer,1988).This study provides further support for fundamental frequency as the mostsalient cue to speaker gender identity. However, not all subjects with a FOhigher than 160 Hz were judged to be female. Two of the MTS subject with FOwithin the female range, i.e. 165 and 189 Hz, were not judged to have highlyrepresentative female voices. Spencer concludes, therefore, that, “auxiliarycues to speaker sex must have been available to the listeners which weakenedthe effect of the fundamental frequency for these 2 individuals” (Spencer,1988). Spencer found that four anatomical males who had intentionallyaltered their speech patterns were indeed perceived as being female. This is atodds with a number of studies, for example Coleman (1976), that have foundthat male vocal tract resonance characteristics are not disguised even whenthe FO of the vocal source is raised into the female range.Spencer suggests that the “success” of certain MTSs may be due to theirheight; she notes that none of the MTS subjects who were identified as female24were over 5 ft, 8. It is possible that raising the FO was complemented bysmaller stature to achieve a more feminine voice. Assuming that shorterpeople typically have correspondingly shorter vocal tracts, acoustic theorypredicts that the associated formant frequencies will also be higher. However,Spencer found evidence that contradicts this prediction: the tallest femalecontrol was consistently rated as female, while several shorter MTh subjectswere not felt to have speech patterns representative of a female. Spencerpostulates a possible, “deliberate manipulation of vocal tract configuration bythe successful MTh speakers” (Spencer, 1988). The author points out thatmost MTSs report changing more than the pitch of their voice. They maychange the precision and prominence of consonant production, as well asaltering voice quality so that it is “softer”. They also report attempting tomimic female intonation patterns.Other researchers have attempted to provide information to help directtherapeutic intervention with the MTS population (Coleman, 1983; Wolfe,Ratusnik, Smith and Northrop, 1990; Bralley et al., 1978; Kalra, 1977). Ofparticular importance to the present investigation is the study by Wolfe et al.(1990). These researchers were interested in differences in the use of FO andintonation patterns for MTSs who were rated as having male or femalevoices. They observed that there appears to be a range of F0s that would beconsidered representative of a given gender category. It is thereforeimportant to those working with MTS clients to have a sense of where thecutoff might be between the categories. As indicated by Spencer (1988) there25appears to be a categorical boundary around 160 Hz that listeners may beusing.The purpose of the Wolfe et al. (1990) study was to find the lowest FOassociated with those MTS speakers identified as female and to see ifparticular intonation patterns were associated with those speakers who werejudged to be female. Listening tapes were prepared of the speech of MTSsubjects in addition to that of anatomical male and female speakers, whowere included as a frame of reference. Naive listeners made a categoricaldecision about the gender of each speaker. The voices were then rated along aseven point masculinity/femininity equal-appearing interval scale. Thesamples were analyzed acoustically to determine average FO and intonationpatterns. The authors defined an intonation as “a frequency modulation orchange either upward or downward without interruption of phonation”(Wolfe et al., 1990). They included a second measure of intonation patternsthat they refer to as a “frequency shift”. Shifts are described as “a change infrequency that takes place between the terminal frequency of a givenphonation and the initial frequency of the subsequent phonation” (Wolfe etaL, 1990).Results of this investigation demonstrated that for those MTS speakers whowere categorized as female, FO average was significantly higher. They reportan average FO average of 172 Hz, ranging from 155.6 to 195 Hz for MTSsjudged to be female, while FOs for MTS speakers judged to be male rangedfrom 97.2 to 145 Hz, with an average of 118 Hz. The data indicate that those26voices perceived as feminine were characterized by a higher percentage ofupward intonations and downward shifts with fewer level intonationspatterns than those voices judged to be masculine. This lends further supportfor the observation that female voice is less monotonous than male voice.Wolfe et al. (1990) make the following observation: “Noteworthy was the factthat the rated-female [speaker] with the lowest average FO (155 Hz) also hadone of the highest (most feminine) ratings on the masculinity-femininitydimension (5.13). Very possibly, this individual’s intonation pattern, whichwas characterized by a high percentage of upper intonations (25.45) and a lowpercentage of level intonations (3.64) contributed to the high femininityrating”. This suggests that in cases where the fundamental is providingambiguous cues to gender identity, factors such as intonation patterns may becontributing to the representation of female voice. Information provided bythese researchers will be incorporated into the experimental design of theproposed study.Methodological IssuesAnatomical controls and subject selection:Anatomical females and males were included in this study to serve as areference against which to compare MTS voice and speech characteristics.Spencer (1988) used female and male controls for a similar purpose. In herstudy, Spencer investigated the extent to which Male-to-Female Transsexualswere able to achieve female speech characteristics after a period of vocaltraining. The controls in Spencer’s study were matched to the subjects withrespect to height, weight, and age. As observed in previous studies, these27physical characteristics may be detected in a speaker’s voice and maytherefore, interfere with the perceptual judgements made by listeners (Lass &Davis, 1976; Ptacek & Sander, 1966; Benjamin, 1981). However, fromanecdotal observation and research that has presented evidence to thecontrary, there is reason to suggest that these physical characteristics are notreadily detected in a speaker’s voice and therefore will not have aconfounding effect on analyses. For example, it is not unusual to find a tallsoprano singer or a short tenor. It is also true that regardless of vocal foldlength or mass, humans are capable of producing wide frequency ranges.Graddol and Swann (1983) found inconclusive results in their review ofstudies attempting to show that listeners can judge speaker height and weightbased on speaker fundamental frequency. For the purposes of this study,subjects were selected to include a range of weights and heights representativeof the general population while controlling for the physical characteristic that,from the research, appears to be most consistently related to voice change,that is, age.Benjamin (1981) found that the voices of elderly speakers, i.e. speakers abovethe age of 68, differ from those of younger speakers in the following ways:older speakers produced lower pitches, larger intonational ranges, and greaternumbers of inflections than did younger speakers when reading a shortpassage. In addition, Benjamin found that frequency perturbation in olderspeakers was significantly greater than that of younger speakers. Otherstudies also suggest that there are noticeable differences in the speech of older28and younger individuals, such as degree of hoarseness and breathiness, aswell as differences in the rate of speech (Ptacek & Sander, 1966; Hartman &Danhauer, 1976). Although the task in Benjamin’s study differed from thepresent investigation, there is enough evidence to suggest that perceptual andacoustic measures may be confounded by the inclusion of elderly individuals.Perceptual Studies: Rating Scales and VPARecently, there has been considerable interest in the creation of reliable toolsfor use in rating perceptual characteristics of voice. As a result, researchershave proposed new protocols and theoretical frameworks from which toapproach the evaluation of perceptual features. Gelfer (1988) created a set ofperceptual rating scales which were intended to provide a comprehensiveand flexible way of describing normal voice. She began by choosing a largenumber of descriptors commonly used in the field. Individuals in voice-related professions were then used to determine appropriate opposites forthese descriptors and, eventually, to narrow these down to 17 bipolaradjective pairs. The reliability and confidence with which raters could usethese scales was tested by having both speech-language pathologists and acontrol group of untrained listeners rate a number of voices. Gelfer foundthat both trained and untrained listeners could use these scales with a gooddegree of consistency and confidence. Since, for the purposes of this study, itwas important for untrained listeners to be able to rate the voices reliably, therating scale was constructed using five of the bipolar adjectives shown tohave good group consistency.29In addition to these five perceptual voice attributes, one further bipolar scalewas added: masculine - feminine. Spencer (1988) used a similar measure asan indication of how representative a voice was of a particular gender. In thepresent study, this scale served essentially the same function; however, therating procedure used by Spencer was somewhat different. In that study,raters were told they were going to listen to a group of men and womenreading a passage. They were asked to judge whether the speaker was male orfemale. They were then asked to indicate on a seven point scale “the degreeto which each voice was representative of the chosen sex” (Spencer, 1988). Inthis study, to ensure unbiased judgements were made, listeners remainednaive to the purposes of the investigation.Due to the variability across listeners in the perception of voice characteristics,there has been some controversy regarding the validity of perceptualmeasures (Kreiman, Gerratt, Kempster, Erman and Berke, 1993; Bassich &Ludlow, 1986). Kreiman et al. (1993) reviewed the types of perceptual ratingsystems which have been used in recent work and they propose a frameworkfor the perceptual evaluations of voice characteristics in future studies. Theyreport that results of perceptual studies have demonstrated great fluctuationsin inter- and intrarater reliability. The studies included in their review werecharacterized by generally weak methodology. As these authors point out, inorder for perceptual ratings to be meaningful there must be consistency in theuse of the rating scale or protocol used. To ensure this consistency, theysuggest that a clear theoretical approach is necessary. They further suggestthat for listeners to be effective, it is important that they develop internal30standards for vocal qualities based on their own experience with voices.Because the majority of listeners have extensive experience with “normal”voice quality, their internal representation will be relatively stable.Experience with pathological voices will vary considerably across listeners;therefore, there may be significant differences in how listeners judge theseverity of these voices. Because of this discrepancy, they suggest that naivelisteners will have a tendency to judge voices differently than trainedlisteners, judging pathological voice relative to an internal scale moreappropriate to normal voices. Results of their study demonstrate that it ispreferable for listeners to have extensive training in the use of protocolswhich use fixed reference voices replacing idiosyncratic, internal standardswith a stable set of perceptual referents.The Vocal Profile Analysis Protocol (Layer, 1986; 1991) was developed toprovide a systematic, theory driven way of describing normal andpathological voice. This protocol was developed to take into account theperceptual-acoustic characteristics of the vocal tract as a whole. It assumesthat the lips, jaw, and tongue have as important an influence on voice qualityas the larynx or velopharyngeal system. It looks at voice quality in terms oflong-term tendencies for a particular part of the vocal tract to be held in acertain position, called a “setting”. Voices are rated in comparison to a welldefined baseline, the neutral position. This position is an abstract referencequality and is not meant to be taken as a description of normalcy, as noted inthe User’s Manual, “Almost all speakers deviate from neutral in some way”(Beck, 1991). Raters are required to decide if and to what degree a particular31setting of the vocal tract deviates from the neutral position. Trainingincludes listening to recorded samples exemplifying the various settings anddegrees of deviation from neutral and incorporates motor theories ofperception by training judges to learn to produce the qualities being assessed.Thus, the protocol satisfies the conditions for reliability proposed by Kreimanet al. (1993). The VPA provides a systematic and therefore more reliable wayof making perceptual judgements. In addition, it was designed to be used indescribing both normal and pathological voices. For these reasons, it wasconsidered to be the most useful perceptual protocol for the purposes of thisstudy.As noted above, the tradition of voice research has focused primarily on thestudy of male voice; but it is also important to point out that these analyseswere performed in controlled situations in which various characteristics ofvoice were considered in isolation. Because of the inherent difficulty inanalyzing voice characteristics, experimental procedures were usuallyperformed on sustained vowels or other small segments such as syllables.With advent of more sophisticated computer software to facilitate theanalysis of specific voice features, it has been possible to expand the scope ofinvestigation to more naturalistic voice samples. Current programmes suchas CSpeech have been developed to and are capable of providing detailed andaccurate information based on natural speech samples. This is of obvioussignificance to the professional working with voice disorders as success intreatment is based on the clients conversational speech, and collecting32normative data at a conversational level allows us to develop more effectiveprocedures for the assessment and treatment of voice disorders.HYPOTHESES:This experiment sought to determine ranges of perceptual and acousticvalues that are representative of female and male voice and to use thisinformation as a point of comparison in the investigation of the perceptualand acoustic voice/speech characteristics exhibited by anatomical males whoare attempting to speak with a more feminine voice.The following hypotheses guided the study:1. Specific acoustic and acoustic/perceptual cues, in combination, will be themost salient in the determination of speaker gender.2. MTS speakers will demonstrate speech and voice characteristics that aredistinct from those of anatomical female and male speakers.3. Specific perceptual variables will be related to acoustic variables as follows:a) FO average measures will be correlated with trained listener judgements ofmean pitch.b) FO range measures will be correlated with trained listener judgements ofpitch range.c) Fl will vary systematically with perceptual judgements of raised/loweredtongue position.33d) Fl will vary systematically with perceptual judgements of raised/loweredlarynx position.e) F2 will vary systematically with perceptual judgements of tongueanterior/posterior positioning.f) Measures of the rate of FO change and the averaged extent of P0 change willbe correlated with pitch variability judgements.4. A relationship exists between physical characteristics and acousticcharacteristics:a) Measures of age, height, weight, neck length and neck circumference willbe related to fundamental frequency.b) Measures of height and neck length will be related to formant values.5. MTS speakers who are judged to be producing a “successfully” femalevoice will demonstrate voice and speech characteristics similar to those ofanatomical females. Those judged to be “unsuccessful “will have voice andspeech characteristics more similar to those of male speakers.34CHAPTER 2METHODSSubject Selection:Subjects were Male-to-Female Transsexuals (MTSs) at various stages ofgender reassignment. Only those individuals who were attempting to speakwith a feminine voice were included in the study. It was assumed that theseindividuals were attempting to alter their voices by means of specificmechanical manipulations of the larynx and vocal tract, as well as changingtheir speech patterns to more closely resemble what they believed to be afemale pattern. A preliminary interview was conducted with the MTScandidates to determine if and how these individuals were altering theirvoice and/or speech patterns.Eleven transsexual subjects met criteria set for the study. An equal number ofanatomical female and male control subjects was selected for the study. Thesespeakers were included to represent the normal range of variability againstwhich to compare MTS voices, after Spencer’s model (1988). Beyond thegeneral and vocal health requirements, subject selection for controls wasdriven by age matching criteria. The age range of subjects chosen toparticipate was restricted because of the possible confounding effects of voicechanges that occur as a result of the aging process (Benjamin, 1981; Ptacek &Sander, 1966; Hartman & Danhauer, 1976). All speakers were between theages of 23 and 53. (See Appendix I for demographic data).35All participants were healthy, post-pubertal individuals who had no historyof injuries or physical, neurological or psychological conditions that wouldimpair speech intelligibility or voice function. All were residents of theLower Mainland of B.C., and native speakers of English. Individualsidentified as having speech, voice or hearing problems judged to have apossible confounding effect on results were excluded from the study, as werethose whose dialects are not reflective of the standard Western Canadianpattern.Subject Screening:Subjects completed a brief history form detailing their vocal and generalhealth. This form also provided information regarding vocal training,possible dialect differences, as well as any concerns individuals might haveabout their voice on the day of the recording. A short interview was heldbetween each participant and the examiner to clarify and/or expand onspecific issues raised by the history form, to allow the examiner to evaluatethe individual’s conversational voice/speech characteristics and to allow theparticipant to ask any questions or to air any concerns that s/he may haveprior to proceeding.A hearing screening was performed on all of the participants. Thiscomponent of the initial screening was included to ensure participants didnot suffer from a hearing deficit that might affect their voice/speech or theirability to perform the experimental task. Pure tones were presented at36screening levels across the speech frequencies, i.e. at 25 dB for 0.5, 1, 2 and 4kHz. Four of the MTS subjects failed the hearing screening, MTSI, MTS4,MTS7, and MTSS; however, no speech problems were detected that could beattributed to hearing loss.Finally, a laryngostroboscopic evaluation was completed to rule out anyorganic vocal fold pathology. All participants passed this screeningprocedure.Experimental Task Construction and Data Collection:Voice Sample:After completion of the initial screening process, subjects and controlsparticipated in a brief, informal dialogue with the experimenter. The topic ofthis discussion was centered on the individual’s experience of getting to theVoice Clinic that day. This preliminary step was taken in the hope that thediscussion would add relevance to and lessen the memory component of theexperimental task. It also provided the experimenter with a reference sampleof the subject’s informal speech against which to judge the representativenessof the experimental sample.The voice recording was made using a studio-quality, unidirectional headsetmicrophone and a Sony videotape recorder, both having frequency responseranges of 50 Hz - 20000 Hz. Voice samples were recorded on 3/4 inchprofessional use videotape to ensure high fidelity sound. One channel wasused to make the recordings.37Each participant was seated in a chair facing the experimenter. Themicrophone was then positioned so that it was approximately 3 cm from thespeaker’s mouth. Next the task was explained. The experimenter informedthe individual that s/he was going to be told a brief anecdote describing theexperimenter’s first visit to the Voice Clinic and that the story would bepresented twice before s/he would be required to retell it.The experimenter stressed the fact that the individual’s memory of the storywas not as important as the naturalness of his/her speech. Therefore, it wasexpected that the exact content and ordering of the events in the story wouldbe variable in each retell performed by subjects. However, the individualbeing recorded was asked to indude a specially designed sentence that was tobe embedded in the retell: “You got lost in the building trying to find theVoice Clinic”. Portions of this target sentence were used in all subsequentanalyses. Construction of the target sentence is discussed below.The task was designed so that voice samples would approximate speakers’conversational speech while controlling for content of the sample andcreating a situation in which all speakers were required to say the samesentence. This was necessary for valid comparisons of acoustic analyses.The target sentence was printed on a blank sheet of paper and provided as amemory aid. Participants were instructed to include this sentence exactly as itwas printed on the sheet and to fit the sentence in so that it did not appear as38the first or last sentence in the story. They were also informed that if forsome reason they had failed to do so, they would be asked to repeat the storyand try again.Each speaker was recorded telling the anecdote twice. The first attempt atretelling the story was treated as practise so that the speaker was more at easewith being recorded and so that the memory component was furtheralleviated. On occasion, a speaker misunderstood or forgot the specificinstructions regarding the inclusion of the target sentence. In this event, theresultant recorded anecdote was not counted. Only retellings in which thetarget sentence was complete and correctly embedded within the story wereconsidered. The target sentence in each second retelling was used for allacoustic and perceptual analyses performed in this investigation.Pilot StudyA pilot study was performed prior to the commencement of the largerinvestigation. Results from this study were used to select a target vowel thatwould be stable across speakers and that would be most easily analyzed usingthe CSpeech computer software programme. The unstressed, central,“schwa” vowel was selected because this segment was found to haveunambiguous, evenly spaced formant peaks. The target sentence wasconstructed so that it contained the target vowel in an unstressed wordwithin the sentence.39The decision to analyze an unstressed word was also based on the apparentstability of such words, as observed in the pilot study. Because these words didnot play a large role in creating the overall prosodic pattern of the sentence,they typically did not have wide fluctuation in frequency, intensity andduration. The word “got” in the verb phrase “got lost” was chosen foranalysis for a number of reasons. First, it is an unstressed word butsufficiently important to the meaning of the sentence that it would not be leftout or radically shortened. Although the target phoneme in the word is not aschwa, it is consistently neutralized by speakers in conversational speech.Finally, it did not contain other phonemes that might alter the results of theacoustic analyses performed, for example, it is not acoustically altered by thepossible coarticulation effect of an adjacent nasal phoneme or consonantcluster. However, because it is preceded by a back consonant, there is apossibility that F2 values, which typically reflect antero-posterior positioningof the tongue, will be systematically shifted due to coarticulation of thesegments (Kent and Read, 1992). This factor was taken into consideration inthe interpretation of results.Structural Measures:In addition to the voice recording, measures of neck length andcircumference were taken. Length of neck was measured from the lowestmost point of the mastoid bone to the nearest vertical distance on the clavicleafter ensuring that the subject was sitting erect with shoulders in a relaxed,neutral position and jaw at right angles to the floor. Circumference wasmeasured at the level of the thyroid lamina with the subject sitting in neutral40position. Subjects also reported their weight and height on the history formcompleted initially.Analyses of Voice Samples:After obtaining the audiotape recordings, the voices were analyzed in threedifferent ways: acoustically, perceptually and statistically. The proceduresinvolved in these analyses are described below.Acoustic Analysis:All acoustic measures were made with CSpeech, Version 4 (Milenkovic, 1986;1992). Each target sentence was digitized from the videotape master by meansof an analogue-to-digital converter. The samples were then analyzed to findthe following measures: average FO and the range of FO used over the targetsentence, rate and extent of FO change in two target words (“lost” and“voice”), and jitter, shimmer, signal to noise ratio, as well as formant valuesFl and F2 for the target schwa vowel in the word “got”.Although CSpeech includes automatic measures of average FO and P0 rangeover a given speech segment, artifacts in the sample appeared to skew thedata so that results were misleading. Therefore, these values were foundmanually by segmenting the sentence in such a way that artifactualinformation was excluded from the analysis. This enabled the programme touse as much of the readable data as possible, thereby providing the mostaccurate and representative values.41Both formant and voice quality analyses were performed on the schwa vowelin the word “got”. Due to the conversational nature of the task, the length ofthis vowel varied from speaker to speaker. For this reason, the longestregular portion of the vowel in each sample to the nearest glottal pulse wasused for analysis, rather than the same length of segment for each subject.These vowel lengths ranged from 50 msec to 112 msec.The first and second formant values were determined using CSpeechs LinearPrediction Coding (LPC) command. A Fast Fourier Transfer analysis wasdone simultaneously to ensure the peaks provided by the LPC correspondedto those shown by the Fourier analysis. Formant values corresponded topeaks on the LPC curve. Data points recorded were the most intense localvalues. Occasionally, an apparent peak would span several data points. Inthis event, the average of these points was taken as the peak value.Jitter, shimmer and signal-to-noise ratio values were determined with theautomatic “Jitter” programme command performed on the same portion ofthe vowel used in the formant analysis.In order to determine the speaker’s inflectional variability, the rate and extentof FO change within the target sentence was measured for the two words thatwere most often and noticeably inflected: “lost” that was noticeably inflectedin 28 out of the 33 samples and “voice”, in 24 out of 33. Each target word wasconsidered to be “noticeably inflected” if there was a shift in frequency of 10Hz or greater. The majority of speakers used rising intonation with both42words, although speakers occasionally would use other patterns, i.e. falling,fall/rise, or rise/fall. The most regular portion of the rising contour in rising,fall/rise and rise/fall patterns was targeted for analysis. If the contour wasfalling, then the most regular portion of this would be used in the analysis.Cursors were placed at the beginning and end of the rise. The beginning (A)was defined as the frequency value immediately preceding a consistent rise infrequency and the end (B) as the number immediately preceding a levelingout or fall in frequency.The intonation curve produced by CSpeech is shown only over the voicedsegments of the sample to be analyzed. However, because intonation isperceived as continuous over an utterance, the intonation contours wereanalyzed as if they were also continuous. In the event that there was a gap inthe contour before the target words, the change in frequency from thepreceding segment was taken into consideration. If there was a change of 10Hz or more, then the last frequency value of the contour before the gap wascounted as A. The time in msec (t) between the cursors was then recorded.The extent of frequency change was defined as Freq (B) - Freq (A). The ratemeasure was found using the following equation: Freq (B)- Freq (A) / t.Values obtained from the acoustic analysis are reported in Appendix 2.Perceptual Analysis:The voice samples were analyzed perceptually by two sets of raters: naivelisteners and trained listeners. Master listening tapes were produced for theanalyses. The digitized samples used for the acoustic analysis were converted43to an analogue signal and recorded on to the master listening tape using aYamaha amplifier and a Marantz tapedeck. By manipulating the cursors, thesentences could be edited so that only the words of the target sentenceremained.Naive Listeners Analysis:The master listening tape constructed for the naive listeners consisted of twosections. The first section was used to familiarize the listeners with the rangeof voices that they would be required to rate. Speakers were randomlyselected and recorded in succession with approximately a 2 sec pause betweensamples. In the second section, voices were again randomly selected andrecorded, but more time was given between presentations, approximately 10sec, and each voice was presented three times. Two training samplespreceded this section followed by the 33 samples to be used in the study.Naive Listeners (NLs) were individuals who had no special training in voice,speech or phonetics. Twelve raters were included in this analysis: 5 male and7 female ranging in age from early 20’s to early 50’s. There were a total ofthree rating sessions, with four raters attending each group. All threesessions took place in quiet seminar rooms. Raters were seated around a tableand the recording of the voice samples was presented at a comfortablelistening level. A single loudspeaker was placed at the front of the roomfacing the raters and approximately the same distance from each rater. Theloudspeaker was driven by a standard stereo tape deck and amplifier.44NLs were required to complete two separate rating forms (see Appendices 3and 4). The first form, a perceptual rating scale based on Gelfer (1988),included five sets of bipolar adjectives that served as foils and an additionalmasculine-feminine scale that was of primary interest to this investigation.The raters were required to be naive to the questions being investigated in thestudy. This information was not provided as it could potentially bias theratings. Instead, raters were told that they would be listening to a number ofadult voices. They were asked to rate each voice along a nine point scale. Itwas explained that this scale represented a continuum of voices from themost feminine (1) to the most masculine (9) with more ambiguous voicessomewhere in the middle, thus the raters did not need to make a genderjudgement before rating the voice.The second rating task required the listeners to make a decision as to thegender of the speaker. They were instructed to indicate whether the voicewas male or female by circling the appropriate gender on the response sheet.The instructions were worded in such a way that their decision would befocused on the representativeness of the voice rather than on what theyassumed the speaker’s actual gender to be. The relevance of this distinctionwill be addressed in the discussion of methodological considerations inChapter 4.The first form was distributed at the beginning of the rating session.Following the procedure used by Gelfer (1988), NEs were told they were goingto hear a number of adult voices saying the same sentence and that they45would hear each voice three times. The rating scale was then read aloud. Nospecific definitions were given for any of the adjectives as a means ofanchoring the scales. Listeners were told to use their owi understanding ofthe terms and their own judgements. For clarity, it was explained that theend points of the scales represented the extremes of a given characteristic,central values being more neutral or ambiguous. It was also explained thatboth the high pitch - low pitch and femininity - masculinity scalesrepresented a continuum that would include all adult voices, with femalevoices generally tending toward one end and male voices toward the other.After NLs were familiarized with the rating form, all the voices to be ratedwere played so that the listeners would know the range of stimuli. Listenerswere reminded that they would hear each voice three times and that theyshould try to rate two perceptual features on each presentation of the voice.The rating began with two practice voices. When all listeners felt comfortablewith the task, ratings of the experimental voices proceeded. Seven (20%) ofthe thirty-three voices were repeated so that a measure of intrarater reliabilitycould be calculated.Upon completion of this portion of the rating task, forms were collected andthe second form was distributed. Listeners were told that in this task theywould hear the same section of the recording that had been used initially tofamiliarize them with the range of stimuli. Since the voices followed eachother in rapid succession, listeners were told they would have to mark thevoice as either male or female based on their first impression. Again, these46instructions were given in an attempt to prevent the listeners from trying toguess the gender of the speaker rather than the voice. The rating tasks tookone hour in total to complete.Trained Listeners Analysis:Four trained listeners were selected from a pool of speech-languagepathologists who had attended an instructional workshop on the use of theVocal Profile Analysis Protocol (VPA) (Layer & Beck, 1991). Because not allthe information that could be derived from this protocol would be consideredin this investigation, an abbreviated protocol was used (see Appendix 5). Thedecision regarding what items to retain was based on the acoustic measuresbeing made. Only those perceptual features that were hypothesized to becorrelated to specific acoustic measures were included. For example, it hasbeen shown that formant frequencies vary systematically with vocalconfiguration, and that, in particular, Fl varies with tongue height, while F2reflects the antero-posterior positioning of the tongue (Kent and Read, 1992).Therefore, lingual tip/blade features and lingual body features were retained.In addition, voice characteristics, or what is described in the VPA as laryngealfeatures, were also included because these have been found to be related tothe acoustic measures of jitter, shimmer and signal-to-noise ratio (Sorensenand Horii, 1983; Wolfe and Ratusnik, 1988). Pitch features were included sothat these measures could be compared to the speakers’ average FO and FOranges; the pitch variability setting was retained for comparison with the rateand extent of FO shift measures described above.47Prior to rating the experimental voices, the listeners were given a period ofretraining to ensure they still met reliability criteria. Voices were thenpresented in the same manner described above for naive listeners. A total offorty-three voices were rated by the trained listeners: thirty-threeexperimental voices, seven voices that were repeated to measure intraraterreliability and three training voices that were induded at the beginning of thetape to dilute any practice effect. Raters heard each voice a minimum ofeleven times: the first presentation as an introduction to the voice followedby five sets of two repetitions. Raters were reminded to make judgementsusing the three-pass system prescribed in VPA training. Repetitions of thesample series were supplied on request. The experimenter would proceed tothe next voice only when all four raters were satisfied with their responses.This rating session took approximately four hours to complete. Perceptualjudgements obtained from trained listeners are presented in Appendix 6.Statistical Analyses:The data were analyzed using two primary statistical procedures:Discriminant Analysis and a correlational analysis of acoustic, perceptual andanatomical variables.Discriminant function analysis was performed in order to determine whichcombination of acoustic and perceptual variables provides the best separationbetween speaker groups and how MTS speakers are classified based on thesediscriminating variables.48Discriminant Analysis chooses variables to enter into the discriminantfunction in a stepwise manner. The resultant function consists of the mostsalient variables that act to distinguish between groups. Once these functionsare derived, the analysis procedure determines how well the discriminantfunctions describe the data overall. It then evaluates individual scoresobtained and categorizes cases accordingly. The statistical package used in thepresent analyses, BMDP, provided both a classification matrix and a“jackknifed” classification matrix of the cases (Dixon, 1990). The jackknifedclassification is a form of cross-validation that provides a more conservativemeasure of the probability that a given case belongs to a particular group and,for the purposes of this study, is a more useful measure.Correlational analyses were performed to determine potential relationshipsbetween acoustic, perceptual and anatomical measures. Spearman’s Rho wasused to determine correlations as it is best suited to demonstrate relationshipsincluding nonparametric data.49CHAPTER 3RESULTSOverviewThe following research questions will guide the quantified analysis:1) What acoustic and perceptual variables discriminate best between male andfemale voice?2) How are MTS speakers categorized by classification functions thatdiscriminate between male and female voices?3) How are MTS voices distinct from either male or female voice?4) What is the strength of relationships between pairs of perceptual featuresand acoustic measures of interest?5) How do physical/anatomical measures relate with acoustic and perceptualdata?6) How do naive listeners judge the masculinity/femininity of a voice?7) How successful were the individual MTS subjects in this study inachieving a female voice?8) How do naive listener judgements relate to salient acoustic and perceptualfeatures?Results of the analyses are reported in 4 sections: 1) Preliminary analyses, 2)Stepwise discriminant analyses, 3) Correlational analyses, and 4) Analysis ofnaive listener data. The raw data are reported in Appendices 1, 2 and 6. Asummary of results of the discriminant analyses is presented in Table 8 and a50summary of the correlational analysis of perceptual and acoustic features ispresented in Table 9.Preliminary AnalysesReliability:Reliability measures were obtained for acoustic data and the perceptual datafor both trained and untrained listeners.Acoustic DataTarget segments from the voice samples were isolated for acoustic analysis.The criteria for this selection process has been discussed in the Methodssection. Because of the variability of connected speech, it was possible that ifthe segment selected differed in the size or relative position within the targetwords from one analysis to the next, results could vary. Therefore, inter- andintrarater reliability of the above analysis procedure was verified by havingboth the investigator and a second rater, who followed the same protocol,repeat 20% of the analyses. A reliability score was found for all measures andcorrelations for each variable are reported in Table 1. Intrarater reliability forthe acoustic measures was 99% and interrater reliability was 93%.51Table 1: Reliability Scores for Acoustic Measures:Variable Intrarater InterraterJitter 1.0 0.96Shimmer 1.0 0.96SNR 1.0 0.91FO Average 1.0 0.95Fl 0.98 0.97F2 1.0 1.0Extent FO shift 1.0 1.0Rate of FO shift 0.97 0.89Perceptual DataNaive ListenersIntrarater Reliability:Test-retest reliability of their judgements was evaluated. Listening tapes wereconstructed with 20% of the voices repeated for this purpose. Intraraterreliability of gender choice was found to be very strong. The first reliabilitymeasure reflected the naive listeners’ ability to consistently judge speakers tobe male or female. Intrarater reliability for anatomical male and femalevoices was 100%. For the MTS group there was only one listener whochanged the gender choice on hearing the voice a second time.Intrarater reliability was also evaluated for judgements ofmasculinity/femininity. The correlations are presented in Table 2. Anaveraged rho = 0.92 was obtained with values ranging from 0.7 to 0.99. This52suggests strong test-retest reliability for judgements of masculinity/femininityfor all naive listeners except 3A and 4B who demonstrated moderatereliability.Table 2: Intrarater Reliability for Masculinity/Femininity Ratings by NaiveListenersRater Rho1A 0.962A 0.913A 0.704A 0.805A 0.93lB 0.992B 0.983B 0.944B 0.965B 0.976B 0.917B 0.97Interrater Reliability:For anatomical male and female voices, there was 100% agreement acrossjudges as to the gender of the speakers. For the MTS group, however, severalspeakers failed to receive consistent gender judgements. Percentage of naivelistener agreement was tabulated for the MTS data and is presented in Table 3.Overall interrater reliability for gender judgements of MTS subjects was 71%with agreement scores ranging from 27 to 100%. Although some scores were53very low, none of the data was eliminated as it was important to capture theuncertainty of listeners in judging MTh voices for the analyses and discussionthat follow.Table 3: Percentage Interrater Agreement for Bipolar Gender Judgements ofMiS Voices by Naive ListenersIA 2A 3A 4A 5A 2B 3B 4B 513 613 7B 8131A 1002A 82 1003A 73 91 1004A 64 64 55 1005A 100 82 73 64 1002B 64 64 73 27 64 1003B 73 91 64 73 73 55 1004B 82 82 73 45 82 82 73 1005B 82 64 55 64 82 64 73 82 1006B 73 73 64 91 73 36 82 55 73 1007B 82 100 91 64 82 64 91 82 64 73 1008B 8264734582645564645564100Strong interrater reliability for judgements of masculinity/femininity wasindicated by an averaged rho = 0.84 with pairwise scores ranging from 0.64 to0.94. A pairwise correlation matrix of reliability for ratings ofmasculinity/femininity is presented in Table 4. Raters 2A and 3Ademonstrated the lowest interrater agreement. These two raters showedmoderately strong agreement with each other; however, they appeared to bejudging the voices quite differently from the rest of the raters. The54judgements made by the majority of the listeners would provide moreaccurate, generally accepted description of the voices and therefore Raters 2Aand 3A were excluded from the calculation of the averagedmasculinity/femininity score used in subsequent analyses.Table 4: Correlation Matrix of Interrater Reliability forMasculinity/Femininity Judgements by Naive Listeners1A 2A 3A 4A 5A 2B 3B 4B 5B 6B 7B 8BIA 1.02A .64 1.03A .65 .77 1.04A .75 .46 .43 1.05A .88 .52 .47 .81 1.02B .85 .45 .43 .81 .85 1.03B .81 .43 .43 .84 .87 .92 1.04B .84 .52 .53 .77 .85 .89 .92 1.05B .81 .63 .57 .74 .89 .89 .89 .91 1.06B .87 .60 .56 .76 .85 .89 .88 .85 .89 1.07B .73 .56 .49 .64 .72 .75 .83 .75 .78 .76 1.08B .90 .52 .51 .85 .89 .93 .94 .94 .92 .88 .76 1.0Trained ListenersIntrarater Reliability: As with the naive listeners, 20% of the voice samplespresented to the trained listeners were repeated so that a measure of intraraterreliability could be obtained. Following the criteria set by Layer (1980), scoresthat differed by ÷/- I scalar degree were counted as matching. Results are55presented in Table 5. Spearman correlations ranging from .85 to .93 indicatestrong intrarater reliability for all trained listeners.Table 5: Intrarater Reliability for Judgments by Trained Listeners.rhoRater 1 0.85Rater 2 0.9Rater 3 0.91Rater 4 0.93Interrater Reliability: Interrater reliability for the trained listeners wasobtained for each perceptual variable rated. As with the intrarater reliability,scores differing by +/- 1 scalar degree were counted as matches followingLayer’s protocol for VPA. Correlation matrices are presented in Appendix 7.Due to the nature of and small variance in the perceptual scores, thecorrelation coefficients may have been too conservative to describe thereliability of these measures. For this reason, a second analysis wasperformed. A percentage of agreement score was derived for each variablebased on the number of times listeners scores fell within +1- 1 scalar point ofeach other. An overall agreement score was then generated for each variableby averaging the pairwise percentages. Results of the second analysis arepresented in Appendix 8. A summary of overall interrater reliability of theperceptual variables is shown in Table 6.56Table 6: Summary of Overall Interrater Reliability: Correlations andPercentage of Agreement by VariableOverall Reliabilityrho %LTB 0.4 64TFB 0.76 96TRL 0.87 97LRL 0.62 89W 0.38 72MP 0.56 88PR 0.88 97PV 0.82 97The percentage of agreement measure indicates that the mean pitch and thelarynx height variable may be more reliable than indicated by theircorrelation coefficients. Both mean pitch and larynx height demonstraterelatively strong cross-listener agreement. Whisper and lingual posturemeasures demonstrated weaker cross-listener agreement than the otherssuggesting moderate reliability. No rater was excluded from the averagedscore calculated for each variable on the basis of interrater reliability;however, overall reliability of the perceptual variables influenced selection ofvariables that were ultimately included in the statistical analyses.57Variable Selection:Further analyses were undertaken to determine the suitability of acoustic andperceptual variables for the statistical analyses. The following acoustic voicecharacteristics were originally proposed as being important cues fordiscriminating male from female voice: fundamental frequency, FO range, Fl,F2, jitter, shimmer, signal-to-noise ratio, extent of FO shifts and rate of FOshifts. Perceptual variables assumed to correspond with one or more acousticvariables were also proposed. Mean pitch, pitch range, lingual articulatoryposture, tongue body position, larynx position, voice source characteristics, i.e.creaky, harsh, and whispery voice qualities, and pitch variability were rated bytrained listeners. A variety of procedures were followed to select the mostsalient measures for final analysis.On examination of the raw jitter, shimmer, and signal-to-noise ratio data, itwas apparent that a strong relationship existed among these variables andfurther statistical analysis confirmed this observation. Correlationcoefficients demonstrated the following relationships: jitter and shimmer,r = 0.86; signal to noise ratio and jitter, r = -0.94; signal-to-noise ratio andshimmer, r = 0.82. Thus, jitter and shimmer values are well predicted fromSNR. Because of the strong correlations among these variables, it is difficultto determine the individual contribution of each vocal source characteristic tothe male/female distinction. For this reason jitter and shimmer wereexcluded from further statistical analyses.58Jitter and shimmer values were expected to relate to creaky and harsh voicequality. Trained listeners did not rate any of the voices as having a harshquality so it is clearly not a discriminating feature in this study. Creaky voicequality was expected to relate to jitter values. As jitter was excluded from theanalysis, the vocal creakiness judgements were also excluded because there isno independent evidence to suggest that this perceptual characteristic is asalient cue to gender discrimination.Based on the reliability scores, trained listeners appeared to have moredifficulty in judging both lingual articulatory posture and whispery voicequality than the other perceptual measures. Correlations and percentage ofagreement scores were generally low for both. Because the tongue bodyposition the measures were more reliable and were also expected to relate toformant frequency measures, lingual articulatory judgements were notincluded in the analyses. Similarly, trained listener judgements of whisperyvoice quality were omitted based on low interrater reliability.Extent of FO shift was hypothesized to be important in discriminating betweengenders, indicating differential intonation patterns. Examination of the rawdata revealed that this measure did not accurately capture the intonationpatterns of a number of speakers, particularly in the MTS group. It wasobserved that a number of the MTS speakers had generally lower FO averageswith little variability but would use an exaggerated FO shift on the targetwords. The overall impact of this intonation pattern was different from thatwhere the same extent of FO shift occured more frequently and there was59generally more varibility of pitch. For this reason a ratio value wasdeveloped to normalize FO shift data. The extent of FO shift was divided bythe FO average obtained for the target sentence. This transformation will bereferred to hereafter as Ex/FO. This measure relates the average extent of FOshift used in the target words to the overall pitch level of the sentence ratherthan the FO levels used immediately preceding these words.Criteria for Inferential Statistical Tests:The final decision made before statistical procedures were performedconcerned the alpha level at which inferential statistical tests would be made.In this particular investigation, most of the statistical tests were performed infamilies. In order to keep experiment wide error to an acceptable level, analpha level of .01 was chosen.Stepwise Discriminant Analyses:Stepwise Discriminant Analysis was used to address the first three researchquestions. The purpose of this procedure is to find a weighted combination ofthe most informative variables by which to maximize the separation betweengroups. Results reported below will include the variables that are mostimportant in discriminating between groups, the classification equations andthe jackknifed classifications of speakers based on the discriminant function.The discriminant analysis was run in four different formats. First,discriminant functions for male and female voices were generated andevaluated using only information from anatomical male and female subjects.60Then these same procedures were repeated including the MTS subject data todetermine the discriminant function that best separated MTS subjects fromthe other groups. As demonstrated consistently in the literature, FO averageemerged in these data as the dominant variable in the discriminationbetween groups. To examine the role of the other variables more closely thefirst two analyses were repeated with FO average removed. Results of these 4stepwise discriminant analyses will be presented in greater detail in thefollowing sections.Analysis #1: Discriminant Analysis for Male and Female VoiceThe first analysis was designed to answer the following question: whichcombination of the selected perceptual and acoustic features best differentiatesmale from female groups? For this purpose, acoustic and perceptual variablescores from female and male speakers were analyzed to determine thediscriminant function which provides maximal separation between groups ofspeakers. The following variables were entered into the discriminantanalysis: FO average (FO Av), FO range (FO Rg), Fl, F2, SNR, average rate of FOshift (R Av), average extent of FO shift / average FO (Ex/FO), tongue position(front-back) (TFB), tongue position (raised-lowered) (TRL), larynx position(raised-lowered) (LRL), mean pitch (MP), pitch range (PR), and pitchvariability (PV).As discussed in the Methods section, the Discriminant Analysis is amultistepped procedure. Preliminary stages of the analysis reportedunivariate F values, presented in Table 7. Considered individually at this61stage, FO average, FO range, SNR, Fl and Rate of FO shift each reliablydiscriminated between the groups.Table 7: Significant univariate F values provided in the StepwiseDiscriminant AnalysisVariable Univariate F valuesFO Average 91.64Signal-to-Noise Ratio 12.3FO Range 10.12Fl 9.24Rate of FO Shift 9.14The first analysis identified a discriminant function based on two variables:FO average and larynx position:1. FOAv: F=9l.6,df=l,202. LRL: F = 64.8, df =2,19The resultant discriminant function was:0.38574(F0 Av) + 8.49171(LRL) - 82.4178 = Group MembershipThe discriminant function was used to calculate individual scores for allspeakers. A positive score places the speaker in the “female” group and anegative score places the speaker in the “male” group. This procedure, in62effect, fit each case to the discriminant profiles of male and female speakers.Resulting case values are shown in Fig. 1.Jackknifed classification revealed that all 11 female speakers were correctlydassified as female and all 11 male speakers were correctly classified as male.Discriminant scores were also calculated for MTS cases and values wereplotted in Fig 1. Four MTS subjects were classified as female and six MTSsubjects were classified as male.Fig. 1HISTOGRAM OF DISCRIMINANT SCORES - Two Groups with FO included.Classification Function Value calculated for each Casec c bc a a c cc c c acc c a a a aaa ba b b bbb b bb b+-3.6 -2.4 -12 0.0 12 2.4 3.6-42 -3.0 -1.8 -0.06 0.06 1.8 3.0 42Groups: Group Means:a=MTS 1=MTSb = Female 2 = Femalec=Male 3=Male63Results of this first analysis indicate that the acoustic measure of F0 averageand perceptual judgements of larynx position reliably differentiate malespeakers from female speakers. This analysis does not, however, identifycharacteristics that may be unique to MTS speakers. The next analysis wasperformed to explore this issue.Analysis #2: Discriminant Analysis for male, female, and MTS voiceThe second analysis was designed to answer the following question: whatcombination of variables distinguish MTS voice from both male and femalevoice? All variables used for discriminating male and female groups wereagain entered into the analysis.The analysis identified discriminant functions based on three variables: F0average, larynx position and tongue position (front-back).1. FO Av: F = 31.8, df = 2,292. LRL: F = 18.9, df =4,563. TFB: F = 16.6, df =6,54The resultant discriminant functions were:a) 0.80178(F0 Av) - 5.70681(TFB) + 26.86012(LRL) - 87.31055 = MTS Groupb) 1.02049(F0 Av) - 10.46357(TFB) + 34.63707(LRL) - 133.0677 = Female Groupc) 0.68367(F0 Av) - 3.14099(TFB) + 26.40902(LRL) - 77.99504 = Male GroupThese functions were used to calculate individual scores for all speakers andeach case was fit into to the discriminant profiles of male, female and MTSspeakers. A jackknifed classification revealed that all 11 female speakers were64correctly classified as female; 8 male speakers were correctly dassified as male,3 were classified as MTS speakers; and 8 MTS speakers were classified in theseparate MTS group, 2 were classified as male.Results indicate that the acoustic measure of FO average and perceptualjudgements of larynx position and tongue anterior-posterior positioningdifferentiated between the three groups. Again FO Average was a dominantfactor in discriminating between groups. In order to determine thecontribution of other variables in defining the groups, the above two analysiswere repeated without the FO average variable. Results are described below.Analysis #3: Discriminant Analysis of male and female voice with FOexcludedThe third analysis was designed to answer the following question: given thatFO average is the most dominant variable of those analyzed, what othervariables are important to distinguish between male and female voices whenFO average is removed from the analysis? All above listed variables wereentered into the analysis with the exception of FO average.The analysis identified a discriminant function based on two variables: SNRand Fl.1. SNR: F=12.3,df=l,202. Fl: F=9.2,df=2,19The resultant discriminant function was:O.0162(Fl) + O.35253(SNR) - 13.13242 = Group Membership65This function was used to calculate individual scores for all speakers. Again,this procedure fit each case to the discriminant proffles of male and femalespeakers.A jackknifed classification revealed that 9 female speakers were correctlyclassified as female, 2 were classified as male and 10 male speakers werecorrectly classified as male, I was classified as female. Discriminant scoreswere also calculated for MTS cases. Four MTS subjects were classified asfemale and six MTS subjects were classified as male. Case values are shownin Fig. 2.Fig. 2HISTOGRAM OF DISCRIMINANT SCORES - Two Groups with FO excluded.Classification function value calculated for each caseC Cc b cac cca c b caaa ca abcbbbaab a b bb b+-1.4 -0.7 0.0 0.7 1.4 2.1 2.8-1.75 -1.05 -035 0.35 1.05 1.75 2.45 3.15Gmups Group Means:a=MTS 1=MTSb = Female 2= Femalec=Male 3=Male66Results indicate that the acoustic measures of SNR and Fl were dominantfeatures differentiating between the two groups. The final analysis wasperformed to discriminate among three groups without the contribution ofthe FO average variable.Analysis #4: Discriminant Analysis of Male, Female and MTS Voice with FOexcludedThe fourth analysis was designed to answer the following questions: when FOaverage has been removed from the analysis are there other variables whichdifferentiate among the three groups? How are MTS speakers distinct fromthe other two groups? All above listed variables were entered into theanalysis with the exception of FO Av.The analysis identified discriminant functions based on two variables: Ex/FOand SNR.1. Ex/FO: F = 6.9, df =2,292. SNR: F=6.5,df=4,56The resultant discriminant functions were:a) O.5429(SNR) + 15.85832(Ex/FO) - 7.3859 = MiS Groupb) O.89834(SNR) + 4.17815(Ex/FO) - 7.82504 = Female Groupc) 0.48323(SNR) + 4.87385(Ex/F0) - 3.48662 = Male GroupA jackknifed classification revealed that 6 female speakers were correctlyclassified as female, 2 were classified as MiS speakers and 3 were classified as67male speakers. 7 male speakers were correctly classified as male, I wasclassified as an MTS speaker and 3 were dassified as female speakers. 6 MTSspeakers were classified as belonging to the MTS group, 2 were classified asmale speakers and 2 were classified as female speakers.Results indicate that the acoustic measures Ex/FO and SNR were dominantfeatures differentiating the three groups. A summary of the results of theabove 4 analyses are presented in Table 8 and a summary of MThclassifications based on the discriminant functions above are shown inAppendix 9.Table 8: Summary Table of Results for Stepwise Discriminant AnalysisAnalysis # Discriminating Variable F a DF1 b F0 Average 91.6 1,20Larynx (raised/lowered) 64.8 2, 192 FO Average 31.8 2,29Larynx (raised/lowered) 18.9 4,56Tongue (front/back) 16.6 6,543 Signal-to-Noise Ratio 12.3 1,20Fl 9.2 2,194 Average Extent of FO shift 6.9 2,29Signal-to-Noise Ratio 6.5 4,56a F values for second and subsequent variables, treat prior variables as covariates.b Analysis conditions:1 Two groups, with FO average2 Three groups, with FO average3 Two groups, without FO average4 Three groups, without FO average68Correlational Analysis of Perceptual and Acoustic Variables:Spearman rank-order correlation coefficients (rho) were calculated to addressthe question of whether specific perceptual variables are correlated to acousticvariables. Only data from the anatomical male and female speakers wereused as the object of this analysis was to establish correlations betweenacoustic and perceptual measures for those individuals using a relativelyneutral vocal tract configuration. A correlational analysis between SNR andits hypothesized perceptual correlate, Whisper, was not performed sinceWhisper was eliminated due to poor reliability of the scores. Scores markedon the protocol forms needed to be converted to a linear scale so that the datacould be more easily interpretted and related to acoustic data. Table 9 presentsthe conversions made.Table 9: Key to Conversion of Layer Scale Scores for the Purpose of AnalysisVariable Low Score High ScoreTFB Fronted RetractedTRL Raised LoweredLRL Raised larynx Lowered larynxMP High mean Low meanPR Wide range Narrow rangePV High variability Low variability69The following results were obtained for each of the hypothesized correlations:a) FO average measures will be correlated with trained listener judgements ofmean pitch.The Spearman correlation coefficient suggests low correlation, rho = 0.47.b) FO range measures will be correlated with trained listener judgements ofpitch range.This relationship was demonstrated to be statistically strong, rho = -0.84. Theinverse nature of this relationship is explained by the fact that wide rangesare represented by low perceptual scores on the converted Layer scale. Thecorrelation is shown in Fig. 3.Fig. 3Relationship between Acoustic FO Rangeand Pitch Range judgments543 Female2• Male2000• •o• CC C0 100FO Range70c) Fl will vary systematically with perceptual judgements of raised/loweredtongue and larynx positioning.A moderately strong inverse relationship is demonstrated by rho = -0.72. Thecorrelation is shown in Fig. 4. The correlation coefficient for Fl /LRL wasweak: rho = -0.35.Fig. 4Relationship between Fland Tongue Position (raised/lowered)t:400 500 600 700 800Fld) F2 will vary systematically with perceptual judgements of tongueanterior/posterior positioning.The relationship between F2 and TFB was weak, rho = -0.13.71e) Measures of the rate of FO change and the averaged extent of FO change willbe correlated with pitch variability judgements.A moderate inverse relationship between rate of FO change and pitchvariability was demonstrated by rho = -0.65. The correlation coefficient foraveraged extent of FO shift and pitch variability was moderately strong: rho =-0.72. These correlations are shown in Figures 5 and 6.Fig. 5>aa.Relationship between Rate of FO Shift andPitch Variability judgments.• D54321.G 0 Female• Male00.0 0.2 0.4 0.6Rate of FO Shift0.872Fig. 6Relationship between averaged extent of FO Shiftand Pitch Variability judgments5.: . B FemaleB• Male>Ba21 .. • I • I • I • I0.0 0.1 0.2 0.3 0.4 0.5Averaged Extent of FO ShiftIn summary, moderately strong or strong relationships were found for four ofthe seven hypothesized variable pairs. Results of the correlational analysis ofacoustic and perceptual variables are presented in Table 10.73Table 10: Summary of Correlational Analysis of Acoustic and PerceptualVariablesVariables Spearman CorrelationAcoustic Perceptual CoefficientFO Av MP 0.47FO Rg PR -0.84Fl TRL -0.72Fl LRL -0.35F2 TFB -0.13RAV PV -0.65Ex/FO PV -0.72Correlations between Physical Measures and Acoustic and Perceptual DataIn order to determine the extent to which the physical features of speakersinfluenced the acoustic and perceptual characteristics, correlational analyseswere performed to investigate several relationships that have been reportedin the literature. Relationships between average speaker FO or perceptualjudgements of speaker pitch and physical size as reflected by the speaker’sheight, weight, neck length and neck circumference are reported first. Thepossible influence of speaker age on FO average was also investigated. Aswith the above correlational analysis, only data from anatomical male andfemale speakers was used as the inclusion of the ambiguous MTS data mayhave confounded any significant relationship that may have existed. Resultsare presented in Table 11.74Table 11: Spearman Correlations between Physical Features and Perceptualand Acoustic VariablesHeight Weight Neck Length Neck Circ. AgeFO Av -0.46 -0.38 -0.54 -0.67 0.06Fl -0.39 N/A -0.36 N/A N/AF2 -0.11 N/A -0.17 N/A N/AMP 0.09 0.00 0.14 0.31 -0.12TRL 0.32 N/A 0.08 N/A N/ALRL 0.24 N/A 0.28 N/A N/ATFB 0.24 N/A 0.29 N/A N/ANo strong relationships between pairs were demonstrated. However, neckcircumference and FO average demonstrated a moderate correlation. Theseresults suggest that, for this subject sample, speaker FO and pitch are notsignificantly related to physical attributes.Analysis of Naive Listener Judgments for the three Speaker GroupsData obtained from the naive listeners was analyzed to provide a referencepoint from which to interpret the results of the statistical analyses. Firstly,naive listeners were asked to rate each of the sample voices as either “female”or “male”. Naive listeners were consistent in categorizing anatomical maleand female voices, with all listeners demonstrating 100% accuracy.Judgments for MTS voices were less consistent across listeners. This wasexpected as each of the MTS subjects were using their own criteria for75achieving a more female sounding voice. The percentage of female ratingsthat each MTS subject received are presented in Fig. 7 and reflect the overallsuccess with which that speaker is adopting a female pattern of speechaccording to social validation criteria. The underlying assumption is thatMTS speakers who are successfully identified as female will demonstratevoice and speech characteristics similar to those of anatomical females. Thosewho are “unsuccessful “will have voice and speech characteristics moresimilar to those of male speakers. With this in mind, we can determinewhether the voice and speech characteristics of successful MTS speakersresemble those found for anatomical females using the acoustic andperceptual values reported above.Fig. 7Success of MTS voice change based onNaive Listener judgmentsCo4-CCoE-,CoCCoC,a)CoECoUI0CoCo4-CCoUCo0.23456Subject07 8 9 10 11Number76A social validation measure of the degree of masculinity or femininityrepresented by each voice was obtained to further refine the definition ofMTS speaker success. The scores obtained for this measure are presented inAppendix 10. This measure, in conjunction with the hypothesizeddiscriminating variables reported above, may give a clearer indication ofwhich characteristics listeners are using in determining speaker gender. ASpearman rank-order correlational analysis was performed to identify theexistence of such relationships. Results of this analysis are presented in Table12.Table 12: Spearman Correlations between Experimental Variables and NaiveListener Judgements of Masculinity/FemininitySpearman CorrelationVariable CoefficientFO Av -0.91FO Rg -0.65Fl -0.82F2 -0.23SNR -0.72RAV -0.62Ex/FO -0.39MP 0.44PR 0.45TRL 0.52LRL 0.5TFB 0.55PV 0.4277The same variables that played an important statistical role in discriminatingbetween groups also have the strongest relationship with listenerjudgements. Of the relationships shown above, the strongest is betweenperceived masculinity/femininity and FO average. This relationship isshown in Fig. 8.Fig. 8Scatterplot of the relationship between FO averageand masculinity/femIninity judgments10•.0••.z6D FemaleC4. •MaIe0oqo02U0•. i...i.i.i.i. I80 100 120 140 160 180 200 220FO AverageResults of the naive listener analysis will guide the discussion with respect tothe importance of particular variables in gender discrimination, the factors78that separate the MTS speakers from anatomical male and female groups andrelationships between perceptual and acoustic variables.SUMMARY OF MAJOR RESULTSTo summarize the results of this study, we will return to the primaryresearch questions.1) What acoustic and perceptual variables discriminate best between male andfemale voice?Results of the discriminant analyses described above suggest the FO average isthe most dominant characteristic in distinguishing male from female voice.In combination with the perceptual judgement of larynx height, two distinctclassifications for male and female were derived. In the absence of FO, twoother characteristics were important in the discrimination between male andfemale groups: signal-to-noise ratio and Fl. These variables, although not aspowerful in distinguishing between the two groups, correctly classified 9 outof 11 female and 10 out of II male speakers.2) How are MTS speakers categorized by the classification functions thatdiscriminate between male and female voices?MTS subjects were classified on the basis of discriminant functions derivedfor anatomical male and female speakers. Regardless of the presence orabsence of FO in the discriminant function, 4 MTS speakers were classified asfemale and 6 as male.793) How are MTS voices distinct from either male or female voice?When formatted to provide discriminant functions for the three speakergroups, the programme identified two variables that separated MTS speakersfrom either male or female groups. When FO was included in thediscriminant function, FO average, larynx height and tongue position(front/back), distinguished MTS speakers from the others. In the absence ofFO, Ex/FO and SNR provided the best separation between the three groups.4) What is the strength of relationships between pairs of perceptual featuresand acoustic measures of interest?A correlational analysis of acoustic and perceptual variables confirmed amoderate to strong relationship for the following pairs of variables: FO rangeand pitch range, Fl and tongue position (raised/lowered), rate of FO shift andpitch variability and extent of FO shift relative to average FO and pitchvariability.5) How do physical/anatomical measures relate with acoustic and perceptualdata?No strong relationships were identified between physical characteristics andperceptual/acoustic features. This in turn implies that speakers have somedegree of flexibility within their vocal system. For MTS speakers, this meansthat the individual’s ability to achieve a voice with a higher FO and perceivedpitch may not be as strictly limited by their anatomically male vocal tract aspast research has suggested (Coleman, 1983).806) How do naive listeners judge the masculinity/femininity of a voice?Moderate to strong correlations were found between masculinity-femininityjudgements and FO average, SNR, Fl, rate of FO shift and FO range. Thisprovides support for the importance of these variables when genderjudgements are made by listeners.7) How successful were the individual MTS subjects in this study inachieving a female voice?The data from the MTS subjects were analyzed to determine the success withwhich these individuals were achieving a voice representative of the femalepattern. There was wide variation in success between speakers. Those judgedto be “successful” will serve as a reference point from which to discuss goalsto be targetted in voice therapy.81CHAPTER 4DISCUSSIONOverviewThe purpose of this study was to evaluate the voice and speech characteristicsof Male-to-Female Transsexuals and to compare them to those of anatomicalmale and female subjects with two goals in mind: 1) to suggest the speechcharacteristics that are most important in distinguishing male from femalevoice and 2) to determine specific voice and speech characteristics to target intreatment of those individuals wishing to achieve a more feminine voice.Acoustic and perceptual data were considered in the analyses. The two typesof data provide both objective and subjective descriptions of the contrastbetween male and female voice. There is no one to one correlation betweenacoustic and perceptual measures and this stands to reason because of thecomplex nature of human speech. However, for the purpose of developingeffective programmes of intervention, it is important to understand how theacoustic and perceptual aspects of speech are related. Thus the extent towhich acoustic and perceptual measures are correlated was also investigated.To discuss the findings of this study, we will return to the experimentalhypotheses and the research questions that guided the analyses and relatethese to the results.82Acoustic and Perceptual Cues to Gender IdentificationResults of the Discriminant AnalysesAnalysis #1: The first hypothesis was that specific combinations of acousticand perceptual variables could be identified as salient to the distinctionbetween female voice and male voice. Analysis was guided by the followingresearch questions: what acoustic and perceptual variables discriminate bestbetween male and female voice and how do classification functions based onfemale vs. male voice/speech characteristics categorize MTS speakers?Ranges of scores are provided to suggest the typical values found foranatomical female and male speakers.The first discriminant analysis performed identified average speaker FO andthe perceptual judgement of larynx height as providing the best separationbetween the two groups. FO was the dominant feature, accounting for themajority of the variance between groups. The ranges of average FO found forthis study were 88 to 142 Hz for male speakers and 146 to 204 Hz for femalespeakers. The proximity of the values at the low end of the female range andthe upper limits of the male range makes it difficult to suggest a clearseparation between the groups on the basis of FO alone. The remainingvariables were then evaluated by the analysis programme to determinewhether a better separation could be achieved.Once FO average had been factored out, the judgement of whether the speakerwas using a raised or lowered larynx position improved the separation andwas entered into the discriminant function. No additional features were83found to be of help in discriminating between the groups. The success of theresultant function can be seen in how well it classifies speakers. In this case, itcorrectly classified all male and female subjects. Therefore, this combinationprovides the best separation of male and female speakers based on the voiceand speech characteristics of interest to the present investigation. The extentof this separation for the data in the present study is shown in Appendix 11,Figure A. This result is in agreement with numerous studies that have alsofound FO to be the primary cue to gender identification. The connectionbetween physical/anatomical dimensions of the vocal tract and laryngealapparatus cannot be denied. Titze (1989,1994) demonstrated that themembranous length of the vocal folds is the primary determiner of speakerFO. The vocal folds of males typically have a membraneous length 60%longer than those of females and therefore, based on the laws of physics, willvibrate at a lower frequency.It is of interest to consider how the discriminant function is classifyingindividual speakers. The first and most obvious way speakers werecategorized was according to FO average. The raw data clearly demonstratethat as FO increases, the probability of that speaker being female also increases.While the FO data provide a largely linear relationship for genderidentification, the interaction between FO, LRL and gender identification isnot so transparent.Examination of the raw data obtained for larynx height demonstrated thatfemale voice is characterized by higher FO and a generally neutral larynx84position. For male speakers there appeared to be two distinct patterns. Malespeakers with a low FO were typically perceived as using a slightly loweredlarynx position, while male speakers with higher FO averages were judged tobe using a raised larynx position. Thus, higher FO averages associated withraised larynx position appears to be one of two subsets defining male voice.This effect may have been a result of outliers in the control subjects.Although generally all female speakers received neutral larynx positionscores, the two female speakers with the lowest FO averages, F9 and Fli, werejudged as using lowered larynx positions. Male subjects, on the other hand,demonstrated lower FO values but received a range of LRL scores from raisedto lowered larynx position. M9, the male control with the highest FO averagewas rated as using a raised larynx position. These factors help to account forthe classification of the MTS speakers.Based on the first discriminant function in which all subjects were forced tobe classified into male or female groups, four of the MTS subjects wereclassified as female and six as male (one MTS speaker was excluded from alldiscriminant function analyses due to missing data for two of the variablesanalyzed). These speakers demonstrated FO averages ranging from 91 to 178Hz. Six MTS speakers used FO averages well within the female range, twoused averages in the male range, while the remaining two had FO values inthe ambiguous range between the male and female speaker groups (seeAppendix 11, Figure B for distribution of average MTS Speaker FOs). Theclassification function produced as a result of the analysis categorized onlythree of the MTS speakers with FO averages within the female range as male.85This unexpected result can be explained by considering the LRL scores thatthese individuals received. Each one of these speakers was judged to be usinga raised larynx position similar to the relationship between FO average andLRL described for M9. Two of the MTS cases are of particular interest. MTS1demonstrated a FO average in the middle of the female range, at 178 Hz.However, this subject was nevertheless classified as a male speakerapparently because of the extreme raised larynx score that was obtained. Thesecond noteworthy observation was the classification of MTS7 and MTS1O.Both of these speakers had identical FO scores of 157 Hz, pladng them in thefemale range. However, their larynx position scores differed in that MTS7was judged to be using a neutral larynx position while MTSIO was perceivedas using a raised larynx position. As a result, MTS7 was classified a femaleand MTSIO was classified as male.These findings suggest that in order for a MTS speaker to be successfullycategorized as a female speaker, it is necessary for that individual to speakusing a higher FO, preferably above 145 Hz. It is also necessary for the higherFO to be associated with a neutral larynx position, as this was clearly the factordetermining a male classification for some of the MTS speakers and becausefemale speakers typically received neutral larynx position scores. Femaleswill have shorter vocal tracts given their generally smaller physical size.Thus, for anatomical male speakers wishing to achieve a more femininevoice, it would seem reasonable to attempt to shorten the length of theresonating tube. This could be accomplished by raising the larynx. Thedifficulty in raising the larynx is the perceptual effects associated with the86muscular activity involved. It is possible that listener judgements of larynxheight reflect perceived muscular tension as well as actual larynx positioningas these two characteristics are interrelated (Layer, 1980). In this event, raisedlarynx ratings obtained in this study could be indicative of improper vocaltechnique. In order to compensate for the larger anatomical size of MTSvocal tracts, therapy should strive for a raised larynx position to shorten theresonating tube while maintaining a relaxed larynx. Therapy, therefore,should include relaxation and strengthening exercises to promote correctvocal technique thereby reducing muscular tension that could bedistinguishing MTS speakers from anatomical female speakers.The results also demonstrate the way in which the trained listeners aremaking their perceptual judgements. It seems that they are making anapriori judgement as to the anatomical gender of the speaker, and thendeciding on the score that should be given to that individual with referenceto all speakers in that gender group. This is reflected in the unusualrelationship between FO average and LRL scores defining the male group.Again, this issues will be discussed further in the methodologicalconsiderations section of this chapter.Analysis #2: It was also hypothesized that MTS speakers would demonstratespeech and voice characteristics that would set them apart from both femaleand male speakers. From the results of Analysis #1, we have seen that FOaverage and larynx height provide reliable definitions for male and female87groups. The next question addressed was how the MTS group was differentfrom either male and female groups.The first analysis was repeated using data from all three subject groups. Thediscriminant analysis identified three characteristics that best separated thethree groups. These were FO average, larynx position and tongue positionalong the front/back dimension (TFB). As in the first analysis, FO average wasthe dominant factor in discriminating between groups. The main differencebetween the analyses was the inclusion of the TFB variable, suggesting thatthis was instrumental in separating MTS speakers from all others.Examination of the raw data demonstrates that generally MTS scores for TFBwere more similar to female speakers than male speakers. Thus, this featurereliably differentiated MTS speakers form the male group, as defined by thediscriminant function. Again, it is important to consider the classification ofindividual cases in order to understand the differences between the groups.First let us examine how well the classification functions derived in thisanalysis categorize the speakers. All female speakers were correctly placed inthe “female” group. The classification of male speakers was not as successfulas it had been in the first analysis. Eight out of eleven were correctly classifiedas male; however, three were misclassified into the MTS group: M23, M31and M32. M23 and M31 were the two speakers with the highest FO average inthe male subject pooi. These scores, being closer to the average MTS FOvalues than those of the other two groups, were likely the determining factors88in their classification. The third misclassified male speaker had a FO averagewill within the male range; however, both his LRL and TFB scores were moretypical of MTh speakers in that he was perceived as using a raised larynxposition and fronted tongue position.The addition of the TFB variable, describing the extent to which the speaker isusing a fronted or backed tongue position, allowed for essentially all MTSspeakers to be placed in a distinct group. The exception was MTS1I who wasclassified as male despite the fact that TFB scores for this individual moreclosely resembled scores in MTS and female groups. This was apparently dueto this speaker’s FO average and LRL scores that fell within the range ofvalues associated with the male group.The conclusion that can be drawn from the second analysis is that MTSspeakers can be separated from the male group based on front/backpositioning of the tongue body during articulation, the more fronted thearticulation, the less likely that individual will be categorized as male. Whatcan also be seen is that the addition of this factor does not move any of theMTS speakers into the female group. Thus, without altering FO or perceivedlarynx height to resemble the female patterns, MTh speakers will not becategorized as female. This suggests that therapy may be most effective if allthree of these areas are targeted.Analysis #3: It was hypothesized that in the absence of FO as a discriminatingfactor in categorizing speakers, there would be other voice and speech features89that would contribute toward the identification of speaker gender. Severalvoice and speech characteristics were considered that have been shown inpast research to reliably differentiate between genders.Analysis #3 was performed using only the anatomical female and male datato obtain discriminant functions. The analysis identified two variables whichseparated that data into two groups. These variables were SNR and Fl.Although SNR and Fl reliably discriminated between male and femalegroups, it did not demonstrate the same discriminating power as FO Averageand LRL in the first analysis. This can be seen by the fact that two of theeleven females were misclassified as male and one of the male speakers wasmisclassified as female and that the resultant discriminant functionaccounted for 49% of the variance as compared to the 83% accounted for bythe discriminant function of Analysis #1.For female speakers, SNR raw scores ranged from 7.6 to 22.2 dB and Fl scoresranged from 527 to 781 Hz. For male speakers, SNR values ranged from 4.1 to12.4 d13 and Fl values from 449 to 625 Hz. Scores for both SNR and Fl werehigher for female speakers than for male. The speakers who weremisclassified showed the opposite trend. F19 and F20 both had SNR and Flscores lower than those typical of the other female speakers falling wellwithin the male range. The misclassified male speaker demonstrated a fairlytypical SNR score, but had an unusually high Fl frequency of 625 Hz. This90shift in Fl was reflected in the perceptual judgements for this speaker.Trained listeners judged both tongue body and larynx position to be raised.The classification functions categorized four MTS speakers as female and sixas male. As expected, all MTS speakers categorized as female demonstratedhigher SNR and Fl scores than those that were categorized as male. OnlyMTS4 had lower SNR values than typically seen in the female data (9.2 dB);however, apparently due to this individual’s high Fl score, this case wasnevertheless categorized as female.The fact that Fl values differentiated between genders is consistent with theliterature (Kent and Read, 1992; Borden and Harris, 1984) and is predicted byacoustic theory based on the general difference in the length of male andfemale tracts (Fant, 1960). However, results found for SNR were incontradiction to the expected relationship and the findings of otherresearchers (Rammage, 1992; Nittrouer et al., 1990). Female voicesdemonstrated higher SNR values, suggesting better vocal fold vibratoryefficiency for female speakers. This may be explained because of thealgorithm that was used to obtain the SNR measure for CSpeech (Milenkovic,1992). This algorithm is dependent on jitter and shimmer values for thesame programme. Results for this and other studies have indicated that jitterand shimmer values are inversely related to average FO. It has also beenshown that male voice is associated with lower jitter and shimmer values.The SNR measure calculated by CSpeech appears to be positively correlatedwith FO.91Before moving to the last discriminant analysis performed, let us considermore closely the implications of the Fl scores obtained for the speakers in thisstudy. Fl, as a measure of vocal tract resonance will be by definitioninfluenced by the physical size and configuration of the vocal tract. In theanalysis of physical and acoustic correlates to be discussed, no significantrelationship was found between Fl and speaker height or neck length. ThusFl values obtained for speakers in this study may be more reflective of vocaltract configuration and more specifically, habitual tongue and larynxpositioning during articulation. In the first two analyses, both larynx andtongue position perceptions provided discrimination between speakergroups.Due to the nature of Discriminant Analysis, variables that are closely relatedmay lose some of their discriminating power. This aspect of the analysis willbe discused in more detail in a subsequent section. Since Fl appears to covarywith FO due to the tendency of vocal tract size and laryngeal size to covary, itsindependent discriminating ability may have been neutralized. However, thefact that it appeared in the discriminant function once FO had been removedattests to its importance. Furthermore, Fl was found to be inversely related totrained listener judgements of tongue positioning along the front/backdimension (rho = -0.79). Although this relationship was not originallyhypothesized, this result may reflect the lack of variance between the twoperceptual judgements of tongue position, TFB and TRL. A strongrelationship between TFB and TRL was demonstrated (rho = 0.83) indicating92the listeners’ difficulty in perceptually distinguishing between the twodimensions. Neither of the tongue position measures showed a significationcorrelation to the perceptual measure of larynx height. Thus, Fl appears to bereflecting a habitual articulatory posture of the tongue; however, from theresults presented here, it is not clear whether it is the degree of front/backposture or the extent to which the tongue is raised or lowered. Furtherresearch is required to clarify this issue; however, it appears that Flcorresponds in some way to habitual articulatory posture of the speaker.The final analysis was performed to determine if there were any factors thatseparated the MTS speakers from both male and female groups when FO hadbeen eliminated from the variable pool.Analysis #4: This analysis provided further information regarding variablerelationships inferred in Analysis #2 with FO excluded. The finaldiscriminant analysis utilized all subject data and separated it into threegroups, producing two discriminant functions. The analysis identified twovariables that best discriminated between the groups after controlling for FO.These variables were the extent of FO shift relative to overall FO and SNR.The classification functions derived for this analysis correctly categorizedseven of the female speakers and eight of the male speakers. Seven of theMTh speakers were placed in a separate MTS group based on the abovevariables.93This proved to be the least powerful of the discriminating conditions, each ofthe discriminant functions accounting for only 33% and 31% of variancerespectively. Two of the misclassified female speakers were placed in theMTS group, while the other two were placed in the male group. Two of themale speakers were placed in the MTS group and one in the female group. Ofthe MiS speakers, two were classified as female and one as male.The new variable entering into the discriminant function in this analysis wasthe averaged extent of FO shift (Ex/FO). As observed in Analysis #3, SNRreliably discriminated between male and female groups. When asked toseparate the data into three groups, it was Ex/FO that differentiated the MTSgroup from both male and female groups. MTS scores on this variable wereconsiderably higher than values found in either female or male groups. Fourof the eleven female subjects had Ex/FO scores closer to the average Ex/FOscore of male speakers. Two of these, F19 and F20, also had low SNR scores.Because of this, they were classified as male. The other two, having relativelyhigh SNR scores, were placed in the female group. Similarly, the two malesubjects with the highest SNR scores were classified as female. The malesubject with the next highest SNR score (9.9) also had the highest Ex/FO scoreof the male speakers and was therefore categorized in the MTS group.From this analysis, we can see that the extent of FO shift relative to FO averageis a significant factor in discriminating MTS speakers from both male andfemale groups, as defined by the discriminant functions. This suggests thatMTS speakers may be using specific intonation patterns that set them apart94from anatomical females and males. The MTS speakers, in this study, usedan intonation pattern characterized by extreme fluctuations in FO, rises of upto 115 Hz, during key words in the target sentence. This may reflect thespeaker’s best attempt to approximate female voice. Based on the genderstereotype that females use more variable intonation patterns in their speech,the MTh speakers demonstrating the greatest FO fluctuations were apparentlyexaggerating patterns that they felt would be more typical of female speakersin an effort to sound more feminine.In the analyses separating the data into only two groups, this measure ofintonational variability did not demonstrate reliable discriminating powerbetween male and female speakers. This indicates that the stereotyped beliefmay not exist in actuality, although more research in this area is clearlyrequired. The implication of this finding for the clinician is that rather thanworking on increasing the extent of FO fluctuations or intonational variabilityin the speech of MTS clients, it might be more effective to discuss thedifferences between gender markers and gender stereotypes in speech, and incertain cases, to reduce the extent of FO fluctuations used by the MTS speaker,as these may sound unnatural and may be the cues that listeners are using todecide the speaker is not anatomically female.The Nature of the Discriminant Analysis, its Limitations, and Interpretation:Results of the discriminant analyses reported here suggest that a small subsetof the variables entered into the discriminant functions are the most usefulin separating the speaker groups. However, the way in which the95discriminant function analysis is performed may actually obscure theimportance of some of the variables that were hypothesized to havediscriminating power.The discriminant analysis is a stepped procedure that calculates theindependent contribution of each variable to the separation between groupsat each step. Before the stepping procedure begins, univariate F values arecalculated. These values are used to determine the most discriminatingvariable of those under consideration. This variable is then entered into thediscriminant function and removed from the variable pool. At the next levelof the analysis the F values are calculated again to determine which variablemight contribute further to the separation between groups. If anothervariable with such discriminating power exists, it is then entered into thediscriminant function and so on. What is not clear from this description isthe fact that only the first calculation of the F values appears to be aunivariate analysis. Subsequent analyses continue to be influenced by thevariables that have been entered into the discriminant function.The effect of this is that preliminary calculation of F values identified anumber of variables that have statistically significant discriminative power.These values were recorded for Analysis #1 in Table 7. The F scores can berank ordered revealing which variables independently discriminate betweenthe groups. For Analysis #1 the variables in order of importance were FOaverage, SNR, FO range, Fl, and rate of FO shift. Of these five, only threeappeared again in any of of the analyses as having discriminative power: FO96average, SNR, and Fl. This highlights the close relationship of the fivevariables listed. Because SNR, FO range, Fl and rate of FO shift are in someway related to FO average, when this variable is entered in to the function,none of the other apparently powerful variables continue to make anindependent contribution to the separation between groups. FO average,being by far the most powerful, neutralizes the discriminative power of theother four variables.Although the process of discriminant analysis necessarily selects the mostpowerful variables, that is not to say others do not contribute to groupdiscrimination. FO average is the dominant variable in the first two analysesin distinguishing between the groups. This finding supports past research,suggesting that FO is the primary cue to gender identification. The relativeimportance of the other voice and speech characteristics is not that easilydetermined based on the results of the present investigation. In the absenceof FO, SNR received the highest preliminary F score and therefore appeared asthe dominant discriminating factor between male and female. Similarly,with FO controlled for, Ex/FO was the most important feature separating theMTh group from male and female groups as defined by the discriminantfunction. Clearly, these variables are statistically very important in definingspeaker groups; however, further study is required to determine the relativeimportance of the other voice and speech characteristics.97Acoustic and Perceptual CorrelatesThe Discriminant Analyses provided information regarding which voice andspeech characteristics reliably classified male, female and MTS voice. Theresults suggested that average speaker FO and SNR are very important inseparating male from female voice and that FO and Ex/FO are the variablesthat most reliably set the MTh speakers apart from both anatomical male andfemale speakers. As this condusion is based on the statistical power of thesevariables, it is difficult to know whether these are the features that thelistener is making use of to decide speaker gender. In order to furtherinvestigate what the listener is basing gender judgements on, a correlationalanalysis between acoustic and perceptual variables was performed.It was hypothesized that certain perceptual variables would be correlated toacoustic variables based on past research suggesting that such relationshipsexist. Results of the correlational analysis demonstrated that the acousticmeasure of FO range used by speakers was reliably related to perceptualjudgements of pitch range. Statistically significant correlations were alsofound between Fl and the judgement of raised/lowered tongue body position,and the perceptual judgement of pitch variability with both the acousticmeasures of rate of FO shift and the extent of FO shift relative to average FO.Interestingly, the expected correlation between FO average and the perceptualjudgement of mean pitch was not found in the data presented here. In fact,there was a small tendency for speakers with higher FO averages to be rated ashaving a lower mean pitch and for those with a lower FO to be rated as having98a higher pitch. This may be explicable considering the complex way in whichacoustic and perceptual variables interact. Researchers have found thatchanges to certain acoustic voice characteristics will have an effect onconstellations of perceptual variables (Klatt and Klatt ,1990; Wolfe andRatusnik, 1988; Coleman, 1971). This is of particular importance to thepresent investigation because the MiS speakers are altering their voicesprimarily by raising their habitual FO. However, in so doing, they may beincreasing laryngeal tension and vocal tract constriction (Layer, 1980). Thismay in turn influence voice quality. Thus, there is a potential interactionbetween the acoustic output associated with these mechanical changes thatmay impact on listener perception (Wolfe and Ratusnik, 1988).It is important for the clinician to be aware of possible interactions of acousticvariables when evaluating the voices of clients experiencing any vocaldysfunction, but it is of particular importance in working with MTS speakerswho wish to achieve a more feminine voice quality. Altering averagespeaker FO alone may not be sufficient for the successful attainment of thisgoal. Because what we perceive as listeners is the end product of a complexinteraction of acoustic events, therapy should target constellations of relatedvoice characteristics rather than independent acoustic variables. So, forexample, while the client is working on raising FO, at the same time dueattention should be paid to the way in which this change is affecting differentacoustic variables, such as SNR and jitter and how these changes in turn areaffecting a listener’s perception of the voice, i.e. is the voice sounding rougher99or has the resonance changed. By targeting combinations of variables,treatment will be ultimately more effective.In addition to the interactive nature of acoustic and perceptual variables, it ispossible that the unexpected relationship revealed between FO and meanpitch reflected a potentially confounding methodological problem. Trainedlisteners were required to compare both anatomical gender groups and theMTSs. The listeners were apparently making an initial gender judgementand then rating the voices based on how the speaker related to other speakersin that assumed gender group rather than on a continuum of human voices.This issue will be addressed in more detail in the methodologicalconsiderations section.Physical and Acoustic/Perceptual CorrelationsBefore leaving the question of how acoustic and perceptual variables arerelated, we will expand the scope of this examination to include the possiblerestrictive influence of physiology and anatomy. The physical dimensions ofthe vocal tract and larynx will affect the acoustic output generated by a givenvocal system. Numerous studies have shown a direct relationship betweenphysical size of various vocal structures and the frequency of FO andformants. For example, systematic relationships have been found betweenvocal fold dimensions of thickness and length and FO of phonation (Hollien,1960a, 1960b; Titze, 1989); and overall laryngeal size has been shown to bereliably related to judgements of vocal pitch (Hollien, 1962). In addition, Titze(1989) found that overall larynx size was systematically related to sound100power, mean airflow, glottal efficiency and amplitude of vibration. Researchinto formant values has highlighted relationships between physiology andacoustics. Acoustic output has been explained by placement of articulatorsand the resultant modifications to the length and opening of the vocal tract byanatomical size and shape (Kent & Read, 1992; Borden & Harris, 1984).The research to date suggests that modification of such voice characteristics asFO and formant frequencies may be limited because of anatomical andphysiological factors. For the clinician working with MTS clients, it isimportant to consider the extent to which acoustic output is restricted byanatomical structure as evidence of such limitations would put into questionthe possible effectiveness of therapy with this population. A furthercorrelational analysis was performed to investigate correlations betweenstructural features, such as height, weight, neck length, neck circumference,and the acoustic variables of FO and formant frequencies. In order to capturethe possible effect of physical characteristics on listener judgements of vocalpitch and resonance, the perceptual variables hypothesized to be related tothese acoustic features were also analyzed.As reported in Chapter 3, the only relationship found was between FO andneck circumference (rho = -0.67) . The interpretation of this relationship isdifficult as neck circumference provides only a very crude measure of thespeaker’s physical size. It is possible that the measure of neck circumferenceprovides a gross estimate of laryngeal size. The inverse nature of therelationship suggests that as laryngeal size increases, FO decreases. More101sophisticated measures of anatomical structures might reveal otherrelationships that were not found in this analysis.The fact that the present study did not demonstrate significant relationshipsbetween physical and acoustic/perceptual variables does not mean thatphysical structure does not limit what a given speaker is able to do with his orher voice. However, it does suggest that it may be possible for a particularspeaker, for example, an anatomical male speaker, to change his voice to acertain extent within the parameters defined by his anatomy.From informal observations, it is clear that the human vocal system isextremely flexible. Experience with our own voices and those of othersdemonstrates that we are capable of producing an enormous range offrequencies and voice qualities, e.g. female impersonators. The fact that thereare anatomical males who are able to produce a convincingly female voiceprovides evidence that this is so. This is not to say that all MTS clientswishing to alter their voices will be successful. Each individual will berestricted to a greater or lesser extent by her own anatomy, a fact that shouldbe made clear to the client beginning treatment. However, with propertraining, it should be possible for most clients to achieve a more femininevoice without the surgical alteration of the vocal folds.102The “Success” of MTS Speakers and its Relation to SpeechNoice FeaturesPreliminary Subject Interview:The preliminary MTS subject screening interviews revealed that achieving amore “female sounding” voice was very important to eight of the elevenMTSs included in the study both in work and social situations. One subject,MTS8, stated that her voice was not a real concern, but that she had changedher voice since assuming a female identity. MTS subjects were asked whatconscious measures they had taken to accomplish this change. Seven of thesubjects had not had any professional assistance in changing their voices.Most of these subjects stated that they tried to speak with a higher pitch, to usea “softer” voice and to speak more slowly, articulating and emphasizingwords more carefully. Several subjects said they tried to speak in a more‘musical’ way suggesting they would try to exaggerate the prosodic contour oftheir speech. One subject, MTSII, said that she did not try to raise her pitch,but rather she tried to speak more softly so that there was less “vibration” inher voice, while using a “normal voice”. Four subjects, MTS2, MTS3, MTS6,and MTS7, had worked with an S-LP. The treatment period for these subjectsranged from 3 months to 2.5 years. These subjects reported that primary goalsin treatment included speaking consistently with a higher pitched voice andchanging the prosodic pattern of their speech to more closely resemble afemale pattern. These preliminary interviews revealed the importance ofadopting a female voice and speech pattern for this population and thefrustration associated with difficulty in affecting this change.103In order to more fully understand the definition of female voice, one finalanalysis was performed. This portion of the investigation involved a moresystematic look at the way in which naive listeners judge speaker gender.The hypothesis was that MTS speakers who were “successful”, based on thefrequency with which naive listeners identified them as female, would havevoice and speech characteristics more closely resembling those of anatomicalfemales. Those MiS speakers who were not as successful were expected tohave characteristics more representative of male patterns.Naive listeners assigned a gender to all thirty-three subject voices. The MTSspeakers who were rated the majority of the time as female were consideredto be “successful”. By comparing “successful” MTS voices to voices ofanatomical males and females, it was hoped that we would be able todetermine the characteristics that listeners are using to make a genderdecision and that the attributes found to be of importance in this analysiswould correspond with those found to have discriminative power in theDiscriminant Analyses.Results of this analysis demonstrated that only one MTS speaker, MTS2, was100% successful, with all 12 naive listeners rating this individual as “female”.Comparing this information to the gender categorizations made by theDiscriminant Analyses, it is apparent that all three of the MTS speakers mostoften rated as female (MTS2, MTS5, and MTS7) were also most often classifiedas female based on acoustic and perceptual characteristics of their voices. Thiscomparison is presented in Table 13.104Table 13: Comparison of the Gender Categorization of MTS Speakers basedon the Discriminant Analyses and Naive Listener JudgmentsSubject Discriminant Analyses Percentage ofTotal Categorizations Female JudgmentsFemale Male MTSMTS1 0 2 2 75MTS2 3 0 1 100MTS3 0 2 2 50MTS4 1 1 2 75MTS5 2 0 2 92MTS6 0 2 2 33MTS7 2 0 2 92MTS8 0 3 1 8MTS1O 2 1 1 50MTS11 0 3 1 0This suggests that the acoustic and perceptual variables highlighted by theDiscriminant Analyses as having discriminative power may be employed bynaive listeners to make a gender decision.There were somewhat contradictory results for MTSI and MTSIO. Based onacoustic and perceptual data, MTS1 was never included in the female group;however, 75% of listeners rated this subject as female. On the other hand, thediscriminant analyses categorized MTS1O as female twice, but only 50% of thelisteners rated this speaker as female. For these speakers, it appears that there105might have been some overriding voice or speech characteristic that listenerswere relying on to make a gender decision. The unexpected result presentedhere may reflect the difficulty of the task.The confidence associated with any gender choice made will be dependent onhow ambiguous the voice and speech characteristics are for a given speaker.However, in a forced-choice situation, the difficultly faced by the listener inmaking this decision is obscured. Therefore, a bipolar judgement of speakergender may be somewhat misleading. It does not reflect how representativeof a given gender an individual voice is felt to be. Thus, it might provideoverly optimistic or pessimistic data with regard to the success of MTSspeakers. In order to evaluate the strength of the gender judgementsobtained, an analysis of the degree of masculinity/femininity represented byeach voice was performed. The averaged scores are presented in Appendix 6.The majority of MTS speakers, with the exception of MTS8, 9, and 11,received scores on the feminine side of the rating scale. However, unlikethose of the majority of anatomical female and male speakers, the tendencywas for MTS voices to obtain scores falling in the ambiguous area toward themiddle of the scale. This suggests that although a speaker’s gender has beenidentified as female, differences in voice and speech characteristics persist thatmake this speaker less representative of a given gender than others. The factthat MTS2, 5 and 7 received the highest femininity scores in the MTh group,however, provides further confirmation that these speakers were the mostsuccessful of the MTS group.106Finally, a correlational analysis was performed to explore the possiblerelationship between acoustic/perceptual voice and speech characteristics andmasculine/feminine judgements made by naive listeners. The underlyingassumption was that if a relationship existed between the acoustic andperceptual features under investigation and the judgements of naivelisteners, this would support the suggestion that these features are importantto gender identification. All subject data was used in this analysis because theway in which MTS subjects were being rated was also of interest. Thisanalysis revealed moderate to strong relationships between speakermasculinity/femininity scores and the following variables: FO average, FOrange, Fl, SNR, and Rate of FO shift. The strongest correlation found was thatbetween FO average and Masculinity/femininity judgement. Thisrelationship is shown in Figure 9.107Fig. 9Relationship between listener judgments ofmasculinity/femininity and FO average by group10•.8.1E. .2. 6 El . . : raIe- 4• Male• •. ..2.n.V • I80 100 120 140 160 180 200 220FO AverageWhether the listener is aware of what factors are influencing his or herdecision, all of these acoustic features were significantly related to the degreeof masculinity or femininity represented in the voice. Interestingly, these arethe same variables receiving significant F scores in the univariatepreliminary analysis of Discriminant Analysis #1, reported in a previoussection. This provides further justification for targeting these features intherapy.108Methodological Considerations:Sample SizeDue to the small sample size, it is necessary to use caution in extending theinformation presented here to the general population. The statistical resultsdo not carry as much weight as a study using larger numbers would. For thisreason, much of the present discussion has focused on individual cases.However, because there are similarities between results obtained from thisstudy and those found in previous work, we can interpret the data with areasonable amount of confidence.Listener Rating ScalesThere has been considerable debate regarding the validity of perceptual ratingscales. Kreiman et al. (1993), highlight some of the difficulties that haveplagued perceptual methods. In order to control for these problems, theseresearchers suggest the use of rating scales having a solid theoretical basis.The Vocal Profile Analysis is such a protocol. It has been developed for use inrating both normal and pathological voices. In spite of the suitability of thisprotocol, the trained listeners demonstrated some difficulty in providingcertain judgements. They expressed concern when rating the MTS subjectsand other voices that exhibited some ambiguity, particularly when asked torate such characteristics as mean pitch, pitch variability and pitch range,variables that the raters felt would be highly correlated with the gender of thespeaker. Thus, in the case of the more ambiguous voices, the trained listenersdemonstrated uncertainty regarding their judgements; they often noted thegender that they had decided a speaker to be before making a perceptual109rating. Clearly, although the VPA was designed and is a good tool for theanalysis of normal and pathological voice, it was not designed to providespecific information with regard to speaker gender. Further modification tothe instructions given to raters and understanding of how listeners are ratingvoices is required in order for the perceptual measures obtained in this type ofresearch to be interpreted with confidence.Because of the ambiguous nature of many of the voices in this study, it isimportant to consider more closely not just the characteristics that are beingrelied on as cues in the decision making process, but the way in whichlisteners are making decisions. From the results of the trained listeneranalysis it was apparent that there were two distinct decisions that were beingmade about each voice, one regarding the speaker’s gender and one regardingthe gender to which the voice belongs. The distinction between these twodecisions may appear to be a subtle one; however, it affects the way in whichthe data is to be interpreted.In the most common case, when a voice is heard, listeners willsimultaneously judge the gender of the voice and the speaker. Because mostvoices fall clearly within the boundaries of the listener’s definition of malevs. female, an instantaneous categorical decision of the speaker’s gender canbe made. In this situation, it is difficult to see that the listener is reallymaking two decisions. However, when a voice is more ambiguous, that is,when it has certain characteristics that could be considered male as well as110having characteristics that could be considered female, the decision becomesmore dearly a two part one.There appears to be a process of evaluation, not necessarily conscious,whereby various characteristics of the voice are weighed so that the ultimatedecision of speaker gender can be made. During this process, the listener maybe unsure of speaker gender but may feel the voice falls within the range ofvoices s/he has experienced to belong to a particular gender. For example,one of the trained raters in the study decided to treat a particular voice asmale but noted that she had heard women who sounded like this.Apparently, this voice fell within the range of female voices she hadexperienced, but, for other reasons, the speaker’s gender was assumed to bemale. This clearly demonstrates the divisibility of these two decisions.Researchers in this field of investigation need to be aware of the nature of thedecision that has been made, as the preliminary decisions made by thelistener will affect the outcome. It would be advisable to control for the wayin which the decisions are made by explicitly stating the criteria for thejudgement. For example, specify that listeners judge the voice based on whatthey believe the anatomical gender of the speaker to be or specify that theybase their judgements on the range of male and female voices they haveexperienced without making a bipolar gender judgement.This difficulty could also be controlled by using trained phoneticians to ratespeaker voices. Unlike speech-language pathologists with extensive111experience who have been trained to evaluate voices for vocal pathologyrelative to well defined internal norms, phoneticians might be better able torate the vocal characteristics without first making that gender judgement. Infuture studies, attention to these issues will provide more reliable and moreeasily interpretable results.Directions for Future ResearchThis investigation has attempted to address some of the current issues in thearea of gender markers in voice and speech; however, much work remains tobe done. The database on female voice is expanding. This is in large part dueto the growing field of voice synthesis. For researchers and professionalsworking to improve the synthesis of human voice, an understanding of thedifferences between male and female voice is paramount to the perfection ofsuch systems.From the data presented above, the possibility of a hierarchy of acoustic andperceptual variables defining male and female voice is not yet clear. Due tothe nature of the Discriminant Analysis, there is some ambiguity regardingthe relative importance of the variables. Preliminary univariate analysis inanalysis #1 produced five variables that reliably discriminated between maleand female speakers; however, these variables were not ultimately enteredinto the discriminant functions. The correlations found between thesevariables and naive listener judgements of masculinity/femininity allude totheir importance. Further investigation is necessary to confirm the relativeimportance of these features in a nonstatistical way. That is, because of the112interrelated nature of these variables, removing the data set of one acousticfeature from the statistical analysis will not remove its influence on the othervariables. Manipulating voice samples so that specific acoustic and perceptualvariables are controlled for might provide dearer evidence for the relativeimportance of these features in gender identification.This study has sought to further define female voice both acoustically andperceptually. Results presented here have indicated the difficulties inherentin perceptual rating scales and definitions. That is, voice qualities perceivedby the listener are not the direct result of a single acoustic characteristic; thereis no one-to-one correlation. Instead, there appears to be a complexinteraction among these variables. Further research is required to exploreacoustic and perceptual covariates. Specific research questions would include:Are there specific constellations of acoustic variables that covary, for example,FO, Fl, and SNR, and what is the nature of these relationships? What sets ofacoustic variables determine our perception of vocal characteristics such aspitch, roughness, breathiness? How do changes in the acoustic signal effectour perceptual judgements?Given the weakness discussed above with regard to the decision- makingprocess in perceptual ratings of voice, it would be valuable to construct astudy addressing this issue. The present study could be repeated using trainedphoneticians in order to control for the effect of listeners making apreliminary gender judgement before rating the voices or by making the113criteria for rating the voices more restricting. This might reveal trends in theperceptual data that may have been obscured in this investigation.Intonational differences between male and female speakers were not shownto reliably distinguish between the genders in this study. However,examination of the raw data suggests that there is a tendency for females touse more variable intonation patterns than males. Because this observationis based on the intonational information derived from two key words in thetarget sentence, it is not possible to extend the significance of this findingbeyond the present study. This was not a study intended to focus on thecomplex differences that exist between genders at a discourse level, andtherefore, it is difficult to make any conclusive statements in this regard.Results of the present investigation do suggest that further examination ofthe gender specific characteristics at the discourse level is merited: Are therespecific intonation patterns that characterize female speakers as distinct frommale speakers at a discourse level? What is the nature of these differences?Are there specific intonational patterns used by the different genders acrosscultures?Beyond the suprasegmental level, there are many questions regarding genderdifferences in language use and content that are still to be answered: Dofemale speakers preferentially produce certain syntactic structures overothers? Do women and men differ in their choice of words and vocabularyusage? Do men speak more quickly with fewer pauses than women? Domales and females use the same cooperative principles in their114conversations? Are men and women aware of the differences that potentiallyexist? This research would provide valuable information regarding thevalidity of stereotyped beliefs and the power of these beliefs on the manner ofspeech adopted by speakers of different sexes.Finally, there is yet much to be learned about MTS voice and the ways inwhich to assist these individuals in achieving a more feminine voice. Moredetailed consideration of the aspects of MTS voice that set it apart from bothanatomical male and female speaker will enable the professional workingwith the MTS client to eliminate atypical voice and speech patterns tofacilitate attainment of a successfully female voice. No independent measureof laryngeal tension was obtained for this study; however, results presentedhere suggest that this may be a perceptual cue to listeners that the speaker isnot anatomically female. Research focusing on the success of MTh speakersrelative to laryngeal tension will demonstrate the importance of trainingMTS clients in the use of proper vocal technique.Little is know about the outcome of the therapeutic programme currentlybeing used with this population. Comprehensive studies reporting details oftherapies attempted and the effectiveness of these therapies are required.Further exploration into MTS vocabulary use and conversational style andcontent is also necessary. Research is needed in all of these areas in order forclinicians to provide more effective intervention.115CONCLUSIONThe present study was designed to provide information of importance tospeech scientists, rehabilitation specialists and other professionals in theremediation of gender related voice problems. It was also intended tocontribute to the growing database for female voice and speech characteristics.Results from the present study suggest that there are a number of acoustic andperceptual variables that differentiate female from male voice. The mostimportant of these is average speaker FO. This is consistent with other studiesthat have shown FO to be the dominant factor in listener identifications ofspeaker gender. The other variables that were found to have statisticallyreliable discriminative power are SNR, Fl and the perceptual judgement oflarynx height.Results of this study further demonstrate the lack of one-to-one relationshipbetween acoustic and perceptual variables. Although correlations were foundbetween some of the hypothesized perceptual and acoustic covariates, certainfeatures are apparently related in more complex ways. Clinicians and voicescientists using perceptual measures in their evaluation and investigation ofvoice characteristics must be aware of how these variables interact and whatinfluence the interactions have on the interpretation and use of the dataobtained.Information provided by this investigation suggests that in addition to thetraditional approach of raising the habitual FO, treatment of MTS clientsshould include increasing Fl to values within the females range. This116change may be achieved through the manipulation of tongue posture thatappears to contribute to increased Fl values. Due attention should be givento ensuring that the FO targeted is at a comfortable level for that speaker andthat the change in FO is achieved using proper vocal technique in order tocontrol the laryngeal tension associated with muscle misuse. This precautionis recommended because of the results indicating that high speaker FOaverages associated with the perceptual judgement of raised larynx resulted inthe misclassification of MTS speakers in this study.Given the unexpected SNR scores obtained in this study, it is unclear as tohow this variable is functioning in the discrimination between genders.Further study is warranted before recommendations can be made regardingthe integration of this voice feature in treatment.In addition, results provided here indicate that there are certain speechcharacteristics that separate MTS speakers from both anatomical male andfemale speakers. The results suggest that the reduction of the extent of FOshifts used by MTS speakers may further help to prevent the misclassificationof these speakers. Targeting habitual tongue position in articulation may alsobe useful in preventing MTS speakers from being mistaken as male.The research presented here was intended to increase our understanding ofhuman voice characteristics to enable the clinician to better assess individualsexperiencing gender-related voice disorders and to develop and implementtherapy programmes to best serve the needs of clients. It is hoped that future117investigations will expand on some of the issues raised by this investigationand provide additional normative data to the existing database.118REFERENCESAries, E. (1976). Interaction patterns and themes of male, female, and mixedgroups. Small Group Behavior, 7 (1), 7-18.Aronson, A.E. (1980). Clinical Voice Disorders: An InterdisciplinaryApproach. New York: Brian C. Decker.Bassich, C.J. and Ludlow, C.L. (1986). The use of perceptual methods by newclinicians for assessing voice quality. JSHD, 51, 125-133.Baumann, M. (1976). Two features of ‘women’s speech?’ In B. Dubois and I.Crouch (Eds.) The Sociology of the Languages of American Women.San Antonio, Texas: Trinity University Press.Beck, J. (1991). Vocal Profile Analysis: User’s Manual. Queen MargaretCollege, Edinburgh and University of Edinburgh.Benjamin, B.J. (1981). Frequency variability in the aged voice. Journal ofGerontology, 36 (6), 722-726.Bolinger, D. (1989). Intonation and its Uses: Melody in Grammar andDiscourse. Stanford, Ca.: Stanford University Press.Borden, G.J. and Harris, K.S. (1984). Speech Science Primer: SecondEdition. Baltimore: Williams & Wilkins.Bralley, R.C., Bull, G.L., Gore, C.H., and Edgerton, M.T. (1978). Evaluation ofvocal pitch in male transsexuals. Journal of CommunicationDisorders, 11, 443-449.Brend, R.M. (1975). Male-female intonation patterns in American English.In B. Thorne & N. Henley (Eds.), Language and Sex: Difference andDominance. Rowley, Mass.Brown, B.L., Strong, W.J., and Rencher, A.C. (1974). Fifty-four voices fromtwo: the effects of simultaneous manipulations of rate, meanfundamental frequency, and variance of fundamental frequency onratings of personality from speech. JASA, 55 (2), 313-318.Brown, W.S., and Feinstein, S.H. (1977). Speaker sex identification utilizing aconstant laryngeal source. Folia Phoniatrica, 29, 240-248.119Childers, D.C. and Lee, C.K. (1991). Vocal quality factors: Analysis, synthesis,and perception. JASA, 90(5), 2394-2410.Coleman, R.O. (1971). Male and female voice quality and its relationship tovowel formant frequencies. JSHR, 14, 565-577.Coleman, R.O. (1973). Speaker identification in the absence of inter-subjectdifferences in glottal source characteristics. JASA, 53 (6), 1741-1743Coleman, R.O. (1976). A comparison of the contributions of two voice qualitycharacteristics to the perception of maleness and femaleness in thevoice. JSHR, 19, 168-180.Coleman, R.O. (1983). Acoustic correlates or speaker sex identification:implications for the transsexual voice. The Journal of Sex Research, 19,(3), 293-295.Colton, R.H. (1973). Some acoustic parameters related to the perception ofmodal-falsetto voice quality. Folia Phoniatrica, 25, 302-311.Colton, R.H. and Casper, J.K. (1990). Understanding Voice Problems: APhysiological Perspective for Diagnosis and Treatment. Baltimore:Williams & Wilkins.Colton, R.H. and Hollien, H. (1973). Perceptual differentiation of the modaland falsetto registers. Folia Phoniatrica, 25, 270-280.Cruttenden, A. (1986). Intonation. Cambridge: Cambridge University Press.Dixon, W. (1990). BMDP Statistical Software. Wilfrid Dixon, Ph.D.,University of California.Dixon, W., Brown, M., Engelman, L. and Jennrich, R. (1990). BMDP StatisticalSoftware Manual. Berkeley: University of California Press.Edeisky, C. (1976a). Subjective reactions to sex-linked language. The Journalof Social Psychology, 99, 97-104.Edeisky, C. (1976b). The acquisition of communicative competence:recognition of linguistic correlates of sex roles. Merrill-PalmerQuarterly, 22 (1), 47-59.120Fant, G. (1960). Acoustic Theory of Speech Production. Mouton.Gelfer, M. P. (1988). Perceptual attributes of voice: development and use ofrating scales. Journal of Voice, 2, (4), 320-326.Graddol, D. and Swann, J. (1983). Speaking fundamental frequency: Somephysical and social correlates. Language and Speech, 26(4), 351-366.Haas, A. (1979). Male and female spoken language differences stereotypesand evidence. Psychological Bulletin, 86, 616-626.Hollien, H. (1960). Some laryngeal correlates of vocal pitch. JSHR, 3 (1), 52-58.Hollien, H. (1960). Vocal pitch variation related to changes in vocal foldlength. JSHR, 3 (2), 150-156.Hollien, H. (1962). Vocal fold thickness and fundamental frequency ofphonation. JSHR, 5 (3), 237-243.Holmberg, E.B., Hillman, R.E., Perkell, J.S. (1988). Glottal airflow andtransgloftal air pressure measurements for male and female speakersin soft, normal, and loud voice. JASA, 84 (2), 511-529.Ingemann, F. (1968). Identification of the speaker’s sex from voicelessfricatives. JASA, 44 (4), 1142-1144.Ingrisano, D., Weismer, G., and Schuckers, G.H. (1980). Sex identification ofpreschool children’s voices. Folia Phoniatrica, 32, 61-69.Kalra, M.A. (1977). Voice therapy with a transsexual. In R. Gemme & C.C.Wheeler (Eds.), Progress in Sexology. 77-84, New York: Plenum.Kent, R.D. and Read, C. 1992. The Acoustic Analysis of Speech. San Diego,California: Singular Publishing Inc.Key, M.R. (1975). Male/Female language. Metuchen, New Jersey: TheScarecrow Press.Klatt, D.H. and Klatt, L.C. (1990). Analysis, synthesis, and perception of voicequality variations among female and male talkers. JASA, 87 (2), 820-855.121Kreiman, J., Gerratt, B.R., Kempster, G.B, Erman, A., and Berke, G.S. (1993).Perceptual evaluation of voice quality: Review, tutorial and aframework for future research. JSHR, 36, 21-40.Kramer, C. (1977). Perceptions of female and male speech. Language andSpeech, 20 (2), 151-161.Lass, N.J. and Davis, M. (1976). An investigation of speaker height andweight identification. JASA, 60, 700-703.Lass, N.J., Hughes, K.R, Bowyer, M.D., Waters, L.T., and Bourne, V.T. (1976).Speaker sex identification from voiced, whispered, and filtered isolatedvowels. JASA, 59 (3), 675-678.Layer, J. (1980). The Phonetic Description of Voice Quality. London:Cambridge University Press.Layer, J. (1981; 1991). Vocal Profiles Analysis Protocol. University ofEdinburgh.Leddy, M. (1989). Normal jitter in sustained vowels and connected speech.Personal communication.Lerman, J.W. and Duffy, R.J. (1970). Recognition of falsetto voice quality.Folia Phoniatrica, 22, 21-27.Linke, C.E. (1973). A study of pitch characteristics of female voices and theirrelationship to vocal effectiveness. Folia Phoniatrica, 25, 173-185.Markel, N., Prebor, L., and Brandt, J. (1972). Biosocial factors in dyadiccommunication. Journal of Personality and Social Psychology, 23(1),11-13.Milenkovic, P.H. (1986; 1992). CSpeech: Version 4. Paul Milenkovic, Ph.D.,Department of Electrical and Computer Engineering, University ofWisconsin.Milenkovic, P. and Read, C. (1986;1992) CSpeech Version 4: User’s Manual.Department of Electrical and Computer Engineering, University ofWisconsin.122Milenkovic, P.H., Bless, D.M., and Rammage, L.A. (1991). Acoustic andperceptual characterization of vocal nodules. In Guffin, J. andHammarberg, B. (Eds.) Vocal Fold Physiology. Singular PublishingGroup Inc.Moran, M.J., LaBarge, J.M., and Haynes, W.O. (1988). Effect of voice qualityon adult’s perceptions of Down’s Syndrome children. FoliaPhoniatrica, 40, 157-161.Nittrouer, S., McGowan, R.S., Milenkovic, P.H., and Beehier, D. (1990).Acoustic measurements of men’s and women’s voices: a study ofcontext effects and covariation. JSHR, 33, 761-775.Oates, J.M. and Dacakis, G. (1983). Speech Pathology considerations in themanagement of Transsexualism - A review. British Journal ofDisorders of Communication, 18(3), 139-151.Peterson, G. and Barney, H. (1952) Control methods used in a study of thevowels. JASA, 24, 175-184.Pickett, J.M. (1980). The Sounds of Speech Communication: A Primer ofAcoustic Phonetics and Speech Perception. Baltimore: University ParkPress.Price, P.J. (1989). Male and female voice source characteristics: inversefiltering results. Speech Communication, 8, 261-277.Ptacek, P.H. and Sander, E.K. (1966). Age recognition from voice. JSHR, 9,703-710.Rammage, L. (1990). Acoustic cues for perception of suprasegmental stress.Personal Communication.Rammage, L. (1992). Acoustic, Aerodynamic and Vibratory Characteristics ofPhonation with Variable Posterior Glottis Postures. DoctoralDisertation.Rammage, L., Peppard, R., and Bless, D. (1992). Aerodynamic, laryngoscopic,and perceptual-acoustic characteristics in dysphonic females withposterior glottal chinks: a retrospective study. Journal of Voice, 6 (1),64-78.123Reich, A.R., Frederickson, R.R., Mason, J.A., and Schlauch, R.S. (1990).Methodological variables affecting phonational frequency range inadults. JSHD, 55, 124-131.Sachs, J. (1975). Cues to the identification of sex in children’s speech. In B.Thorne & N. Henley (Eds.), Language and Sex: Difference andDominance. Rowley, Mass.Scherer, K.R. and Giles, H. (1979). Social Markers in Speech. Cambridge:Cambridge University Press.Siegler, D. and Siegler, R. (1976). Stereotypes of males’ and females’ speech.Psychological Reports, 39, 167-170.Smith, P.M. (1979). Sex markers in speech. In Scherer, K.R. and Giles, H.(Eds.), Social Markers in Speech. 111- 146, Cambridge: CambridgeUniversity Press.Sorensen, D. and Horii, Y. (1983). Frequency and amplitude perturbation inthe voices of female speakers. Journal of Communication Disorders,16,57-61.Spencer, L.E. (1988). Speech characteristics of male-to-female transsexuals:a perceptual and acoustic study. Folia Phoniatrica, 40, 31-42.Sundberg, J. and Nordstrom, P. (1976) Raised and lowered larynx - the effecton vowel formant frequencies. Quarterly Progress and Status Report,2-3: 35-9, Speech Transmissions Laboratory.Tannen, D. (1990) You Just don’t Understand. New York: Ballantine Books.Terango, L. (1966). Pitch and duration characteristics of the oral reading ofmales on a masculinity-femininity dimension. JSHR, 9, 590-595.Titze, I.R. (1989). Physiologic and acoustic differences between male andfemale voices. JASA, 85 (4), 1699-1707.Titze, I.R. (1994). Principles of Voice Production. Englewood Cliffs, NewJersey: Prentice Hall.Weinberg, B. and Bennett, S. (1971). A study of talker sex recognition ofesophageal voices. JSHR, 14, 391-395.124Wolfe, V.1. and Ratusnik, D.L. (1988). Acoustic and perceptualmeasurements of roughness influencing judgments of pitch. JSHD, 53,15-22.Wolfe, V.1., Ratusnik, D.L., Smith, F.H., and Northrop, G. (1990). Intonationand fundamental frequency in male-to-female transsexuals. JSHD, 55,43-50.125APPENDICES126APPENDIX 1: Demographic Data for SubjectsS# GROUP AGE HEIGHT WEIGHT NECK SIZE (cm)(cm) (kg) Length Circum.1 MTS 44 164 62 14.5 35.12 MTS 43 180 68 18.3 383 MTS 44 178 73 17.3 364 MTS 26 168 57 14.6 35.45 MTS 38 180 84 16 38.56 MTS 40 168 66 14.7 35.77 MTS 45 178 76 16.2 37.68 MTS 45 178 64 16.3 349 MTS 35 185 89 16 37.310 MTS 37 170 64 15.6 3511 MTS 25 183 68 19.2 34.81 Female 46 168 73 14.6 32.52 Female 29 160 44 15.5 28.53 Female 42 165 59 14.3 31.54 Female 30 172 61 14.3 315 Female 50 160 68 14.7 34.36 Female 52 150 45 11.4 32.67 Female 22 180 84 15.5 40.18 Female 45 168 56 14.2 32.69 Female 53 169 54 14 31.510 Female 31 175 64 15.6 33.811 Female 30 163 57 14.2 32.61 Male 25 188 68 17 35.52 Male 36 178 70 15.7 383 Male 36 182 79 16.4 41.14 Male 44 178 70 16.5 37.15 Male 37 173 77 14.7 426 Male 23 180 67 16.4 357 Male 25 165 56 15.2 34.58 Male 40 165 61 16.7 34.29 Male 41 178 75 14.7 37.410 Male 28 175 79 15.2 37.711 Male 31 178 89 16.5 39.5127APPENDIX 2: Acoustic DataS# Group FO Av. FO Rg. Fl F2 SNR FO ShiftsExtent Rate Ext. /F01 MTS 178 196 508 1504 11.3 91 0.51 0.512 MTS 165 84 625 1612 13.5 35 0.20 0.213 MTS 150 132 381 1533 12.8 115 1.02 0.774 MTS 145 85 605 1113 9.14 45 0.27 0.315 MTS 164 89 576 1582 11.6 53 0.45 0.326 MTS 144 125 576 1426 9.1 74 0.66 0.517 MTS 157 139 615 1338 16.2 96 0.32 0.618 MTS 110 60 508 1318 4 27 0.33 0.259 MTS 96 70 **** **** 7.3 30 0.70 0.3110 MTS 157 97 586 1445 14.8 49 0.34 0.3111 MTS 91 49 566 1309 8.3 30 0.14 0.331 Female 174 117 576 1855 16.5 68 0.54 0.392 Female 189 149 781 1719 17.8 67 0.33 0.353 Female 180 186 586 1230 22.2 82 0.75 0.464 Female 191 35 605 1455 11.2 9 0.10 0.055 Female 177 48 527 1406 16.6 27 0.20 0.156 Female 175 87 742 1484 14.7 52 0.28 0.307 Female 204 140 645 1738 10.3 69 0.42 0.348 Female 173 65 527 1612 8.7 27 0.33 0.169 Female 146 59 547 1280 7.6 17 0.12 0.1210 Female 156 99 586 1533 11.3 52 0.34 0.3311 Female 163 157 703 1289 14.2 47 0.39 0.29I Male 127 58 586 1445 4.5 6 0.09 0.052 Male 92 28 449 1543 9.4 15 0.09 0.163 Male 104 52 508 1299 9.3 21 0.2 0.204 Male 89 32 566 1211 4.1 20 0.09 0.225 Male 88 26 547 1465 4.8 13 0.12 0.156 Male 101 53 508 1211 4.3 20 0.15 0.207 Male 108 53 508 1553 10.6 13 0.16 0.128 Male 118 88 527 1377 12.4 30 0.21 0.259 Male 142 28 488 1543 12.3 12 0.09 0.0810 Male 112 81 625 1348 9.9 44 0.31 0.3911 Male 90 70 508 1553 5.7 26 0.27 0.29128APPENDIX 3: Perceptual Rating Scale #1Voice Number:_________Rater Number:__________1. High pitch Low pitch1 2 3 4 5 6 7 8 92. Clear Hoarse1 2 3 4 5 6 7 8 93. Breathy voice Full voice1 2 3 4 5 6 7 8 94. Feminine Masculine1 2 3 4 5 6 7 8 95. Slow Rate Rapid rate1 2 3 4 5 6 7 8 96. Animated Monotonous1 2 3 4 5 6 7 8 9* * * ** ** * ** * * *** * * * **** * *** ********* ** ** **Voice Number:1. High pitch Low pitch1 2 3 4 5 6 7 8 92. Clear Hoarse1 2 3 4 5 6 7 8 93. Breathy voice Full voice1 2 3 4 5 6 7 8 94. Feminine Masculine1 2 3 4 5 6 7 8 95. Slow Rate Rapid rate1 2 3 4 5 6 7 8 96. Animated Monotonous1 2 3 4 5 6 7 8 9129APPENDIX 4: Perceptual Rating Scale #2Rater Number:____******************************************************************************Voice No. Gender Voice No. Gender1 F M 21 F M2 F M 22 F M3 F M 23 F M4 F M 24 F M5 F M 25 F M6 F M 26 F M7 F M 27 F M8 F M 28 F M9 F M 29 F M10 F M 30 F M11 F M 31 F M12 F M 32 F M13 F M 33 F M14 F M 34 F M15 F M 35 F M16 F M 36 F M17 F M 37 F M18 F M 38 F M19 F M 39 F M20 F M 40 F M130APPENDIX 5: Modified VPA Protocol used by Trained Listeners:1ILill.1•••••4II0zII IIIIII...ii”..”””e.1IbC.KO g11!C)00h..Cl,I—Cl).4.1 131APPENDIX 6: Perceptual DataS# Group Tongue Larynx Pitch FeaturesF/B R/L R/L Mean Range VariabilityI MTS 2 2 1 2.67 2 22 MTS 3 3 2.5 3.33 3 2.833 MTS 2.25 2.25 2.5 1.67 2 2.334 MTS 3.25 3.25 3.5 3.83 2.88 2.835 MTS 3.5 3.5 3 3.33 2.5 36 MTS 2 2 2 1 2.25 2.337 MTS 3 3 3 3.33 2 1.678 MTS 1.75 1.75 2.5 2 3 39 MTS 1.75 2 3.5 2.33 2.75 310 MTS 2.5 2.5 2.5 1.67 2.5 2.3311 MTS 2.25 2.25 3.5 2.67 3 31 Female 2.25 2.25 3 2.33 2.75 2.752 Female 2.25 2.25 3 2.83 2 2.883 Female 2.5 2.5 3 3 2 2.254 Female 3 3 3 3 3.75 3.755 Female 3 3 3 3 3.75 3.756 Female 2 2 3.5 3.33 2.88 37 Female 2 2 3 3 1.75 1.758 Female 3 3.33 3.25 3.33 3.5 3.759 Female 2.25 2.5 4 3.83 3 2.3310 Female 2.75 3 3.5 4 2.88 311 Female 2.25 2.25 3 2.83 2.88 2.881 Male 2.75 2.75 3.5 2.83 2.67 32 Male 3.33 3.33 3.5 3.5 3 33 Male 3 3 3.5 3 3 3.754 Male 2.75 2.75 3 3 3.75 3.755 Male 2 2.75 4 4 3.75 46 Male 3.5 3.5 4 4 3.25 3.57 Male 3.25 3.25 3.75 4 3 38 Male 3 2 3 3 2.25 29 Male 3.25 3.25 2.5 2.5 4.5 4.7510 Male 2 2 2.75 3 2.5 2.511 Male 2.75 2.75 4 4 3 3.25132APPENDIX 7: Pearson Product-Moment Correlations by VariableLingual Posture (tip/blade)Rater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 1.0 reliabilityRater 2 .22 1.0Rater 3 .47 .03 1.0 0.4Rater 4 .63 .56 .5 1.0Tongue Body Position (front/back)Rater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 1.0 reliabilityRater 2 .77 1.0Rater 3 .77 .56 1.0 0.76Rater 4 .97 .73 .73 1.0Tongue Body Position (raised/lowered)Rater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 1.0 reliabilityRater 2 1.0 1.0Rater 3 .74 .74 1.0 0.87Rater 4 1.0 1.0 .74 1.0Larynx Position (raised/lowered)Rater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 1.0 reliabilityRater 2 .3 1.0Rater 3 .53 .8 1.0 0.62Rater 4 .3 1.0 .8 1.0WhisperRater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 1.0 reliabilityRater 2 .37 1.0Rater 3 .57 .35 1.0 0.38Rater 4 .29 .17 .55 1.0Mean PitchRater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 1.0 reliabilityRater 2 .43 1.0Rater 3 .6 .72 1.0 0.56Rater 4 .75 .36 .52 1.0Pitch RangeRater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 1.0 reliabilityRater 2 .77 1.0Rater 3 .91 .84 1.0 0.88Rater 4 .91 .84 1.0 1.0133Pitch VariabilityRater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 1.0 reliabilityRater 2 1.0 1.0Rater 3 1.0 1.0 1.0 0.82Rater 4 .63 .63 .63 1.0134APPENDIX 8: Percentage of Interrater Agreement by VariableLingual Posture (tip/blade)Rater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 100 agreementRater 2 52 100Rater 3 61 55 100 64%Rater 4 70 79 67 100Tongue Body Position (front/back)Rater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 100 agreementRater 2 97 100Rater 3 97 94 100 96%Rater 4 97 94 94 100Tongue Body Position (raised/lowered)Rater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 100 agreementRater 2 100 100Rater 3 97 93 100 97%Rater 4 100 97 97 100Larynx Position (raised/lowered)Rater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 100 agreementRater 2 70 100Rater 3 88 96 100 89%Rater 4 84 100 97 100WhisperRater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 100 agreementRater 2 70 100Rater 3 79 67 100 72%Rater 4 73 64 79 100Mean PitchRater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 100 agreementRater 2 85 100Rater 3 88 91 100 88%Rater 4 94 82 85 100Pitch RangeRater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 100 agreementRater 2 94 100Rater 3 97 97 100 97%Rater 4 97 97 100 100135Pitch VariabilityRater 1 Rater 2 Rater 3 Rater 4 OverallRater 1 100 agreementRater 2 100 100Rater 3 100 100 100 97%Rater 4 94 94 94 100136APPENDIX 9: The Gender Categorization of MTS Speakers based on theDiscriminant AnalysesSubject Analysis # Total Categorizations1 2 3 4 Female Male MTSMTS1 M MTSM MTS 0 2 2MTS2 F MTSF F 3 0 1MTS3 M MTSM MTS 0 2 2MTS4 F MTSM MTS 1 1 2MTS5 F MTSF MTS 2 0 2MTS6 M MTSM MTS 0 2 2MTS7 F MTS F MTS 2 0 2MTS8 M MTSM M 0 3 1MTS1O M MTSF F 2 1 1MTS11 M M M MTS 0 3 1137APPENDIX 10: Averaged Naive Listener Ratings for Masculinity! FemininitySubj # MTS Female Male1 4.2 3.3 6.22 3.8 1.08 8.43 4.7 2.6 84 4.2 2.9 7.35 3.6 2.9 7.96 3.9 2.2 87 3.8 2.5 78 5.5 3.8 7.19 5.2 3.1 6.610 4.1 3.8 6.111 7.9 2.8 8Group Means: 4.63 2.82 7.33Standard Deviations: 1.241 .762 .794I = extremely feminine8= extremely masculine138APPENDIX 11:Figure AHistogram of Anatomical Male and Female Speaker FObbbb bb b b b b b a a a aaaaa a a a...+....+....+..2.+....+....+....+....+....+....-t-..L.+....+....+....+80 90 100 110 120 130 140 150 160 170 180 190 200 210 220Groups Group Means:a = Female I = Femaleb=Male 2=MaleFigure BHistogram of Male, Anatomical Female and MTS Speaker FOCbbbcb c b b b c b b b bcca c ac acc aaa ac a a a a80 90 100 110 120 130 140 150 160 170 180 190 200 210 220Gmups Group Means:a = Female 1 = Femaleb=Male 2=Malec=MTS 3=MTS139


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items