Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The perception of disordered /[inverted r]/ of children in speech therapy by peers and speech-language… Perry, Benjamin 2007

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
831-ubc_2007-0555.pdf [ 2.57MB ]
Metadata
JSON: 831-1.0101017.json
JSON-LD: 831-1.0101017-ld.json
RDF/XML (Pretty): 831-1.0101017-rdf.xml
RDF/JSON: 831-1.0101017-rdf.json
Turtle: 831-1.0101017-turtle.txt
N-Triples: 831-1.0101017-rdf-ntriples.txt
Original Record: 831-1.0101017-source.json
Full Text
831-1.0101017-fulltext.txt
Citation
831-1.0101017.ris

Full Text

THE PERCEPTION OF DISORDERED l\l OF CHILDREN IN SPEECH THERAPY BY PEERS AND SPEECH-LANGUAGE PATHOLOGISTS by Benjamin Perry B.Sc., University of Victoria, 2004 A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF MASTER OF SCIENCE in The Faculty of Graduate Studies (Audiology and Speech Sciences) THE UNIVERSITY OF BRITISH COLUMBIA June 2007 © Benjamin Perry 2007 ii Abstract Subjective rating is the main method for measuring treatment effect for speech disorders in therapy and research. For complicated speech sounds such as Ul, perceptual judgments by ear are subject to variability. The main goals of the current study were twofold: (1) to compare age peer and speech-language pathologist judgments of Ul as spoken by children receiving speech therapy for Ul, and (2) to compare those listener judgments in both a single stimulus identification task and a two-stimulus paired comparison task. Sixteen syllables with lxl were presented by computer over headphones in the two tasks to 24 children (mean age 9 years) and 24 speech-language pathologists (SLPs). Variability, group and task differences were examined. Mean judgments of the tokens by SLPs and children were similar. Intra-rater reliability was better for SLPs than children. In the single stimulus identification task, SLPs also showed better inter-rater reliability than children. In the two-stimulus paired comparison task, SLPs and children had similar inter-rater reliability. Overall, the comparison task resulted in better inter-rater and intra-rater reliability for both groups. Implications for research and clinical evaluation of attempted Ul are discussed. iii Table of Contents Abstract ii Table of Contents iii List of Tables . v List of Figures vi Acknowledgements vii 1.1 General Introduction 1 1.1.1 Impaired Production of Ai / 1 1.1.2 Treatment of Ul 2 1.1.3 Judgment of Ul 2 1.2 Theoretical Foundations and Questions 3 1.2.1 Phonetics of Ul 3 1.2.2 Perceptual Studies of Ul 5 1.2.3 Theories of Phoneme Perception 8 1.2.4 Variability in Category Boundaries 10 1.3 Goals and Questions for the Current Study 10 1.3.1 What Makes a Reliable Ul Judgment Task? 10 1.3.2 Questions and Predictions for the Current Study....... 13 Chapter 2: Methods 15 2.1 Participants 15 2.2 Procedures 18 2.2.1 Stimuli Selection and Preparation 18 2.2.2 Stimuli Acoustics 19 2.2.4 Data Analysis Procedures .. 21 Chapter 3: Results 24 3.1 Analysis of Ratings Data 24 3.1.1 Identification Task .24 3.1.2 Comparison Task 27 3.2. Intra-Rater Reliability 31 3.2.1 Identification Task 31 3.2.2 Comparison Task 31 3.2.3 Task Comparison 31 3.3 Inter-Rater Reliability 31 3.4 Fatigue or Experience Effect 33 3.5 Listener Variables 33 IV 3.4 Fatigue or Experience Effect 35 3.5 Listener Variables 35 3.5.1 Lntra-Rater Reliability , 35 3.5.2 Identification Scores 36 3.5.3 Comparison Scores 36 Chapter 4: Discussion 41 4.1 Goals and Questions for the Study : 41 4.2 Discussion of Specific Results ; 41 4.2.1 Listener Groups 39 4.2.2 Task Comparison 45 4.2.3 Fatigue or Experience Effect 46 4.3 Limitations to the Study....: : 46 4.4 Implications of the Study 47 4.4.1 Research Implications 47 4.4.2 Theoretical Implications 48 4.4.3 Clinical Implications 49 4.5 Conclusion..... 49 References 50 Appendix A: Questionnaire - Demographics 54 Appendix B: Stimulus Presentation Slides 56 Appendix C: Ethics Form ..." 58 List of Tables Table 1 Child Participant Information v 16 Table 2 SLP Participant Information 17 Table 3 Acoustic Data 20 Table 4 Correlations for Children's Mean Identification Task and Acoustic Variables.. ..26 Table 5 Correlations for SLP's Mean Identification Task and Acoustic Variables 27 Table 6 T-tests for Comparison Scores. 28 Table 7 Correlations for Children's Mean Comparison Task and Acoustic Variables 29 Table 8 Correlations for Children's Mean Comparison Task and Acoustic Variables 30 Table 9 . Inter-Rater Reliabilities By Number of Raters for Tasks and Groups 32 Table 10 MANOVA for Children Listener Variables and Comparison Scores 35 Table 11 MANOVA for SLP Listener Variables (version, age, native Canadian, language) and Comparison Scores.... 36 Table 12 MANOVA for SLP Listener Variables (parent, years experience with children) and Comparison Scores ' 37 Table 13 Mean Comparison Score and Position for Stimulus 2NAlar by Version 38 vi List of Figures Figure 1 Mean Identification Score for SLPs and Children 24 Figure 2 Identification Score for Children by Stimulus Formant Height 25 Figure 3 Identification Score for SLPs by Stimulus Formant Height 26 Figure 4 Mean Comparison Score for SLPs and Children 27 Figure 5 Comparison Score for Children by Stimulus Formant Height 29 Figure 6 Comparison Score for SLPs by Stimulus Formant Height 30 Figure 7 Inter-Rater Reliabilities by Number of Raters for Tasks and Groups 32 Figure 8 SLP Age by Mean Versus Mean Comparison Score for 2NA2ar 38 Acknowledgements I would like to acknowledge Penelope Bacsfalvi, Marcy Adler-Bock, Geeta Modha, and Barbara May Bernhardt for their research in ultrasound treatment for articulation disorders which provided the impetus for the present study. I would like to acknowledge committee members Barbara May Bernhardt, Jeff Small and Bryan Gick. I would like to acknowledge Bosko Radanov for help in data collection and analysis. I would like to acknowledge Maja Grubisjic for help with statistics. I would like to acknowledge the Vancouver YMCA after-school program for their participation. I would like to acknowledge my wife, Naomi Ogata. 1 Chapter 1: Introduction 1.1 General Introduction The following document records a study that examines speech language pathologists' and children's perception of normal and misarticulated North American English Ul as spoken by other children. The paper first introduces issues surrounding impaired Ul and its treatment. An overview of the literature concerning Ul perception is followed by a discussion of perceptual task designs, leading to a statement of the objectives and a description of the method and results of the study. The primary aim of the present study was to investigate factors affecting reliability in judgments about Ul. A secondary aim was to gather more information about phonetic perception of Ul. 1.1.1 Impaired Production of Ul Children's mispronunciation of Ul is very common during development. The most common substitution for Ul in syllable-initial position is [w], but other substitutions such as [j] and [1] occur. Occasionally stops and fricatives are also used in place of Ul (Bernhardt & Stemberger, 1998). There is also a substitution that has an /j/-like quality but is not a fully accurate Ul (Shriberg, Flipsen, Karlsson, & McSweeny, 2001). In syllabic or post-vocalic positions, a common substitute is a schwa or a back rounded vowel (Bernhardt & Stemberger, 1998). Misarticulation of Ul can become a long-standing problem for some children. It can lead to problems of intelligibility, but moreover, it can make them seem different from other children. This seemingly small 'error' is often difficult to treat due to the complicated mostly non-visible nature of its articulation, with difficulties sometimes persisting into adulthood 2 (Ruscello, 1995). The World Health Organization International Classification of Functioning, Ability and Health (2006) defines articulation as the functions of the production of speech sounds. In the World Health Organization paradigm, an impairment to a bodily function leads to activity limitations and participation restrictions. Articulation impairment results in a limitation to the activity of speaking. Even though people with articulation disorders can still speak and be understood, there is some loss of intelligibility, even if the only misarticulated phoneme is Ul. Perhaps an even more significant effect of speech impairment is found in the social realm. Hall (1991) studied the attitudes of early adolescents toward those with mild articulation disorders and found that there were significantly more negative attitudes towards children with articulation disorders than those without. With respect to the misarticulation of Ul, it has been shown that teenagers who make substitutions for Ul also tend to be the target of negative behaviours (Silverman & Paulus, 1989). 1.1.2 Treatment of Ul Not all misarticulations of Ul respond well to conventional therapy. When conventional treatment methods are exhausted, innovative methods may be employed. These innovative methods include the use of oral appliances to train posture (Clark, Schwarz & Blakely, 1993), acoustic feedback (Shuster, Ruscello & Smith, 1992; Shuster, Ruscello & Toth, 1995) or feedback about tongue movement and position using ultrasound or electropalatography (Bernhardt, Bacsfalvi, Gick, Radanov & Williams, 2005; Adler-Bock, Bernhardt, Gick & Bacsfalvi, 2007). 1.1.3 Judgment of Ul When a speech-language pathologist (SLP) decides to work on a child's pronunciation of Ul, someone has already made a judgment of an inaccurate Ul production. 3 During therapy, a judgment is made with each attempted production. When making a decision about whether to move on in therapy, perceptual judgments are used. Perceptual judgments are also used in studies of treatment methods (i.e. Bernhardt et al., 2005; Adler-Bock et al., 2007). Indications that perceptual judgments may be unreliable have become apparent in the undertaking of such studies (Bernhardt et al., 2005; Adler-Bock et al., 2007; Modha et al., in press). For example, in one task, Adler-Bock et al. (2007) reported on the percentage of a set of Ul productions that were judged to be on-target Ul by three raters. The raters scored 68.9%, 72.4%, and 44.8% as correct, with a maximum total agreement of 72.4%. Although other data sets within that study had higher rates of agreement, this example demonstrates that raters can vary considerably. 1.2 Theoretical Foundations and Questions The following section examines theoretical issues related to the perception of Ul in adults and children. It will be shown that perception of Ul is prone to variability. In studying this variability, questions can be asked about designing reliable tasks for rating Ul. Questions of variability also lead to general information about perception of Ul and speech perception in general. 1.2.1 Phonetics of Ul The symbol Ul usually denotes a consonant in onset position. A related sound is found in syllable rhymes. This syllabic variant is often transcribed as a vowel with a rhotic diacritic (i.e. unstressed » and stressed ?). Whether in onset or coda position, key articulatory and acoustic features remain the same. For this paper, the symbol Ul will be used to refer to the group of rhotic phones. 4 1.2.1.1 Articulatory Phonetics of Ul Bernhardt et al. (2005) describe in detail the articulatory postures used in production of Ul. There are constrictions at the lips (rounding for pre-vocalic Ul), the palate and in the pharyngeal cavity. The root of the tongue is retracted into the pharynx. The palatal constriction can be made by the tongue tip in "retroflex" Ul or by the tongue blade in "bunched" Ul. The tongue shows mid-line lowering behind the palatal constriction (Gick & Campbell, 2003). There is a lengthwise groove in the tongue. Also, the sides of the posterior portions of the tongue brace themselves against the sides of the teeth (Bernhardt et al., 2005). The relationship between the articulatory posture and acoustic features is still being explored (Espy-Wilson, Boyce, Jackson, Narayanan & Alwan, 2000). 1.2.1.2 Acoustic Phonetics of Ul The most notable acoustic characteristic of Ul is a low F3, over an FI and F2 similar to that of a central rounded vowel (Espy-Wilson et al., 2000). Flipsen, Shriberg, Weismer, Karlsson and McSweeny (2001) identified a number of other acoustic features that have been related to Ul, such as F2 transition rate and duration, fO, FI, F2, F3 and F4 frequency, and the difference between F3 and F2 frequency. Comparison of these studies was made difficult by varied techniques for measuring the acoustic variables (Flipsen et al., 2001). The difference between F4 and F5 has recently been shown to distinguish retroflex and bunched variants of Ul (Espy-Wilson & Boyce, 1999; Hashi, Honda & Westbury, 2003; Zhou, Espy-Wilson, Tiede & Boyce, 2007). However, F3 seems to make the difference between perceiving Ul or perceiving its substitutions [w] and de-rhotacized Ul (Huer, 1989; Shuster, 1998). Additionally, there are indications that F2 may also play a role as well (Hoffman, Stager & Daniloff, 1983; Chaney, 1988; Wolfe, Martin, Borton & Youngblood 2003). . 5 Lee, Potamianos and Narayanan (1999) provide age by gender data for a number of speech sounds for 436 boys and girls between the ages of 5 and 19. The data provide mean fO, FI, F2 and F3 data for the M vowel in the word bird. Generally, all three formant frequencies vary negatively with age, decreasing as children mature. For 8-year-olds saying M, reported mean F3 was 2212 Hz (SD = 187) for boys and 2381 Hz (SD = 378) for girls. The mean F3 -F2 difference was 517 Hz for boys and 668 Hz for girls. For 12-year-olds saying Is-I, the reported mean F3 was 2007 Hz (SD = 170) for boys and 2247 Hz (SD = 398) and the mean F3 -F2 difference was 585 Hz for boys and 477 Hz for girls. 1.2.2 Perceptual Studies of Ul During speech perception, individual phones can be identified in the speech signal. Although judgments about segments are affected by supplemental information, such as lexical information (Warren & Warren, 1970), acoustic information, such as formant frequencies and their correlative distinctive features are used to make phoneme judgments (Espy-Wilson, 1992). Phoneme categories are established during infancy by a subtractive process; distinctions that are not in the native language heard by the infant become less perceptible (Werker & Desjardins, 1995). These categories are not uniform across people and environments, however, and certainly in the case of Ul, considerable allophonic variation of perception has been demonstrated. What follows is a review of literature on the perception of 111. 1.2.2.1 Group Differences There is considerable variability in the perception of Ul between different listeners and listener groups (Sharf, Ohde & Lehman, 1988). A number of researchers have reported perceptual judgments of Ul by children, with varying results. Children in some studies appear to have an 111 phoneme boundary more biased towards the /w/ end of the spectrum than adults, resulting in more judgments as 111 and fewer judgments as /w/ than adults (Menyuk & Anderson, 1969; Slawinski & Fitzgerald, 1998). Ohde and Sharf (1988) report the opposite bias in their discussion and abstract, but the data reported in the methods and results section seem to support the findings of the previously mentioned studies. Of these three studies, two (Menyuk & Anderson, 1969; Slawinski & Fitzgerald, 1998) used synthetic words and the other (Ohde & Sharf, 1988) used synthetic syllables with 111 in onset position. The three studies determined the Ii- w/ categorization bias through an ANOVA, the phoneme boundary operationalized as the stimulus value at which 50% identification of 111 occurred. Further to children's perception of 111, Hoffman, Daniloff, Bengoa and Schuckers (1985) found that many of a group of 22 6-year-old children who misarticulated 111 demonstrated only chance identification of syllables with onsets distributed along the li-wl continuum. This was not the case for those who articulated 111 correctly. However, some misarticulating children have been shown to be more able to identify attempted articulation of 111, perhaps using non-standard markings. Hoffman, Stager and Daniloff (1983) had previously found that some children who misarticulated 111 were better at correctly identifying their own incorrect productions of 111 than other children and adults in an lil-lwl minimal word pair picture pointing task. (Acoustically, a significant difference between the pronunciations of this subgroup and the rest of the subjects was the height of their F2 for 111, which was, on average, between the F2 height of normal 111 and /w/. Surprisingly, the F3 values for this subgroup were not reported.) For adults, amount of clinical experience and phonetic training seem to be sources of variability in the perception of 111. In general, SLPs and SLP students have been shown to 7 differ from others in that they are more likely to judge ambiguous Ul attempts as /w/ (Sharf & Benson, 1983; Chaney, 1988; Ohde & Sharf, 1988). Of these studies, two (Sharf and Benson, 1983; Ohde & Sharf 1988) used synthetic syllables whereas Chaney (1988) used natural word stimuli. As noted above, two studies that use synthetic syllables determined categorization bias through an ANOVA of phoneme boundary, which was operationalized as stimulus number at which 50% identification occurred. Chaney (1988) determined the bias in rating judgment through a post-hoc comparative analysis showing SLP ratings biased towards /w/ in comparison with those of parent and child ratings, which were more evenly distributed. (Statistical information on this difference was not reported.) Further to variability and experience, Wolfe et al. (2003) found that individual graduate students with clinical experience specific to Ul had more unified phoneme boundaries across F2 and F3 continua when compared with other phonetically trained graduate students in a task that involved identifying a synthesized word as "reed" or "weed." However, the group with clinical experience in Ul also had the widest variation in phoneme boundaries, for the most part attributable to one listener with particularly high internal consistency in ratings, who varied from the group mean. 1.2.2.2 Intra-Rater Differences The Wolfe et al. (2003) listener with high internal consistency points to another important aspect of Ul perception: intra-listener variability (Sharf et al. 1988; Shelton, Johnson & Arndt, 1974). For children, Sharf et al. (1988) found that those with problems articulating Ul were inconsistent in categorization of /w/ and Ul on an Ai-w/ continuum. For adults, Sharf and Benson (1982) were able to show listeners' variability stabilizing across sessions, however, particularly when the listeners were given feedback. Intra-rater difference has not been as widely discussed, however, as inter-rater or group differences. 8 Treatment studies often encounter raters with low intra-rater reliability. Although the lowest are often discarded from the study, studies still report low intra-rater reliability for some raters. For example, Modha et al. (in press) reported one listener with an intra-rater reliability of 74%. 1.2.2.3 Summary of Perceptual Studies of 111 The demonstrated inconsistencies in perceptual judgments suggest that perceptual judgments of 111 are quite unreliable. Certainly, reliability is an issue in research and therapy; however, the research studies reviewed so far have not provided much information about what might enhance reliability of judgments. Reliability is important in research because reliable 111 judgments are needed in order to make well-founded claims about 111 production. In therapy also, it is important to know what procedure is reliable for tracking progress in a client's production of 111. Interestingly, SLPs, who are usually making judgments in treatment and research, have been shown to differ from others who will be making most of the judgments in daily life. That raises a general question about who should be making judgments about effectiveness of treatment and why. Comparative information about similarities and differences in SLP's and others' judgments of 111 will help SLPs make more informed judgments in treatment and research. In addition, this information can lead to more general information about speech perception. 1.2.3 Theories of Phoneme Perception A starting point regarding theories of phoneme perception was the theory of Categorical Perception (Liberman, Harris, Hoffman & Griffith, 1957). The theory of Categorical Perception describes the human tendency to perceive and classify a phone as belonging to one category or another. For Liberman et al. (1957) a key feature of categorical 9 perception was that discrimination across a category boundary was better than discrimination within categories. More recently, researchers have attempted to explain listeners' judgments by referring to exemplar- or prototype-based phonemic representations. For example, an exemplar-based model of speech perception such as the Native Language Magnet (NLM, Kuhl, 1993), describes a compressed perceptual space surrounding the best exemplars of a particular phoneme, resulting in reduced sensitivity to differences near the best exemplars of that phoneme. Exemplar models, which theorize that speech sounds are compared to all stored exemplars of that sound, differ slightly from prototype models, which theorize that speech sounds are compared to a prototype that represents an average example of that particular speech sound. Exemplar Models such as the Native Language Magnet and prototype models, (e.g. the Fuzzy Logic Model, Massaro, 1992) do not rely on phoneme boundaries to explain phoneme discrimination. The Fuzzy Logic Model explains category judgments in terms of a probabilistic similarity to prototypes along various feature continua. Massaro (1992) disputes the existence of categorical perception. Iverson and Kuhl (1995), on the other hand, found evidence that similarity to good exemplars (category goodness), in addition to category boundaries were apparent in identification and discrimination or an /r-1/ continuum. Gerrits and Schouten (2004) challenge the notion of a phoneme boundary, arguing that it an artifact present in some stimulus presentation paradigms and not in others. If a category boundary exists, it is likely a theoretical and/or psychological construct representing a point in perception at which a phoneme is or is not judged to be sufficiently similar to a specific exemplar or prototype. 10 1.2.4 Variability in Category Boundaries Whether category boundaries are an artifact of experimental design, there is evidence of group variability. For example, parents, whose goal is to understand their child's speech, may have a more lenient phoneme boundary for Ul than SLPs, whose goal is to correct misarticulated Ul, and who may make stricter judgments about Ul. In one study (Chaney, 1988), SLPs tended to categorize more Ul attempts as Iwl than other adults and children. 1.3 Goals and Questions for the Current Study Investigating what makes a reliable perceptual judgment task is a research question on its own. It also provides an opportunity to gather data about speech perception in general. This section first provides a summative discussion concerning the perceptual task of Ul judgment, first concerning the design of reliable tasks for judgment of Irl (Section 1.3.1), and then considering general questions of speech perception (Sections 1.3.1.2 and 1.3.1.2). This discussion is followed by the questions and predictions for the current study (Section 1.3.2). 1.3.1 What Makes a Reliable Ul Judgment Task? Exploring the factors that make for a reliable Ul judgment task is useful for research and therapy. For clinical research, it helps establish criteria for reliable measurements of treatment effects in treatment studies. For therapy, it can help to identify principles that would make progress measurements reliable. The literature review above indicates a number of factors to consider. For the current paper, the following key variables were examined: number of raters and reliability judgment measures, characteristics of the individual (including bilingualism) and task design. Because of the variability across listeners, it was considered that knowing the impact of individual characteristics on judgments of Ul might prove useful. Because the local environment includes many children and some SLPs 11 speaking English as an Additional Language (EAL), it was considered important to know whether being a native speaker makes a difference in reliability judgments. Finally, determining whether different tasks affect reliability was considered important for future designs. These variables are discussed further below relative to specific questions for the study. 1.3.1.1 Individual Listener Characteristics Many of the studies reviewed so far have found perceptual differences between different groups of listeners based on individual characteristics such as age (Menyuk & Anderson, 1969; Ohde & Sharf, 1988; Slawinski & Fitzgerald, 1998) and SLP training or experience (Sharf & Benson, 1983; Chaney, 1988; Ohde & Sharf, 1988). All of these studies (with the possible exception of Ohde and Sharf, 1988) indicate that adult SLPs are more likely to identify questionable (non-prototypical, non-exemplary) lxl stimuli as /w/ than children. The current study included both trained adults (SLPs) and age peers of the children receiving therapy for lxl as a replication and extension of earlier studies. Another individual characteristic that has not been explored, but is highly relevant in the local context, is bilingualism. It is not known whether bilingualism with English as an additional or second language has an effect on perception of lxl. Often, English monolinguals are used as participants in speech and language research. In light of prototype and exemplar models of speech perception, those participants who have a first language other than English that does not contain lxl in its phonemic inventory might have a weaker NLM effect or prototype. NLM theory predicts a difference for those with English as a second language, i.e., less perceptual compression near the prototype. Hypothetically, listeners who have an LI with no lxl might be better able to discern differences between stimuli closer to the prototype. 12 The current study included listeners who spoke English as an Additional Language in order to explore this variable. 1.3.1.2 Task Design Task design may be a key variable in differences noted in the various studies. Some of the previous research asked raters to identify items from minimal word pairs that contain Ul and/or /w/, 1)1 and IV (i.e. Menyuk & Anderson 1969; Shuster, 1998; Wolfe et al., 2003). If the word selected by the listener matched the intended word of the speaker, then the 111 was considered to be pronounced (and judged) effectively. Other studies have sought to rule out lexical effect by using only syllables for the judgment task (i.e. Sharf & Benson, 1982; Sharf & Benson 1983; Sharf et al., 1988). In the latter experiments, listeners were asked outright whether the syllable contained an 111 or not and their response was recorded. Treatment studies have also asked listeners to judge the 111 production outright in words and phrases (i.e. Adler-Bock et al., 2007; Modha et al., in press). When faced with this sort of 111 judgment task, raters face the difficulty of listening to an 111 without the help of an 'external anchor' (comparison stimulus). Instead of using a singleton identification task, a paired comparison design might improve reliability. Yiu, Chan & Mok (2007) outlined a study that compared voice quality judgment tasks with and without 'external anchors.' The external anchors were synthetic examples on a multi-point continuum of good and less acceptable voice quality. Raters were able to listen to the stimulus and the anchors freely for comparison during a voice quality judgment task. Inter-rater reliability as calculated by intra-class correlation was stronger when there was an external anchor. Although phoneme judgment is not the same as voice quality judgment, it might be a general fact of perception that external anchors improve 13 reliability. Prototype and exemplar models of speech perception give no account of non-developmental variability in internal representation. Hypothetically, based on the voice quality study (Yiu et al. 2007), a paired comparison task should improve reliability of ratings. Thus, both a comparison and an identification task were included in the current study. 1.3.2 Questions and Predictions for the Current Study The issues outlined in the literature review led to the following questions and predictions for the current study. 1. How do listener characteristics, such as age, experience with children, experience with English, bilingual status and native speaker status affect ratings of/J/? As noted throughout the literature review, listeners and listener groups vary in tendency and reliability of perceptual phoneme judgments. The current study set out to replicate and extend this previous research. The main comparison in this study was between ratings of a group of SLPs and a group of child peers of the speakers with Ul difficulty. Based on previous studies it was predicted that the SLPs would be more likely to rate non-prototypical attempted Ul productions as off-target and that the children would demonstrate more variability in their ratings. Secondary participant factors that were explored were the effects of bilingualism including years in Canada and age. The questions were whether status as a native speaker and years of experience in an English-speaking country had effects on rating scores or variability. Since the research was undertaken in Vancouver, a city with a large immigrant population, it was decided that participants should not be excluded because of language status in order to reflect the natural listening situation. Instead, having English as a second language would be examined as a factor in 14 judgment tendency and reliability. Criteria for being identified as not being a native speaker were to be born in another country and for parent and child to speak another language. The null hypothesis was adopted for participant factors. 2. How do the ratings obtained in a paired-comparison task differ from those obtained in an identification task? As indicated in the literature review, task design might affect perceptual judgments. The major question regarding task types was whether performance on a paired comparison task would differ from performance on a more traditional identification task. It was predicted that the paired comparison design might result in greater reliability, because of the potential anchoring effect (as noted for an anchoring task in Yiu et al, 2007). Different orders of tasks (identification first or paired comparison first) might also result in a difference in reliability. This could provide some insight into what Yiu, Chan & Mok (2007) refer to as factors that affect internal standards. 3. How does fatigue or experience affect ratings over a large number of repeated stimulus presentations? A third task variable concerned the change in perception over the course of a long series of judgments. Since the experiment involved making over 100 judgments, the question was whether there would be fatigue effects, a learning effect, both or neither. In the case of the SLPs, the null hypothesis was adopted - that fatigue would not affect variability or judgment tendency for perception of Ul. In the case of the children, the hypothesis was that children's reliability in ratings would suffer from boredom, but that there would be no fatigue or learning effect on judgment tendency. !5 Chapter 2: Methods 2.1 Participants In this study, there were 24 children and 23 SLP participants (see Tables 1 and 2). The two groups of participants were recruited through advertisements, with 16 of the 24 children recruited through a YMCA after-school program. The child group (aged 7-12, mean = 9.3, SD= 1.6) was balanced for gender and included 16 native speakers and 8 children born in China. By age, the child group matched the speaker group whose data they were evaluating. The children with first languages other than English all had either Mandarin or Cantonese as their first language. Cantonese does not have an Isl in its phonemic inventory, but Mandarin has a post-vocalic rhotic sound. Adult SLP subjects were recruited through the Vibrations newsletter, a publication of the British Columbia Association of Speech-Language Pathologists or by word of mouth (snowballing effect). Most of the SLPs were female, native Canadians (with LI of English). By decade they ranged in age from 20s to 60s, with the mean age being 44.5 years. Information was also taken regarding the SLPS' experience with children. Nine were SLPs in a school setting, six worked in public health units, one worked in a hospital, one worked in a private clinic and the others did not specify setting. Almost half (43%) of the SLP participants were parents. Mean years of experience with children under 12 years of age was 15.5 (SD= 10.3). Two of the SLP participants reported having children with speech difficulties and two reported having had speech difficulties as children themselves. Table 1 Child Participant Characteristics' Listener Sex Age Native Born? Years in Canada LI L2 1 M 10 n 10 Cantonese English, French, 2 M 11 y 11 English A bit of French 3 F 10 n 1 Chinese English 4 M 7 y 7 English French 5 M 9 n 1 Cantonese Mandarin, English 6 M 12 y 12 English Portuguese 7 M 9 y 9 English French 8 F 9 y 9 English French 9 M 11 y 11 English French 10 F 10 y 10 English 11 F 9 n 4 Chinese English 12 M 11 y 11 English German 13 F 7 y 7 English French 14 F 9 y 9 English French, Japanese 15 F 9 y 9 English Italian 16 M 7 y 7 English 17 F 9 n 3 Mandarin English 18 F 12 y 12 English French 19 F 8 y 8 English French, Farsi 20 F 9 y 9 English Mandarin 21 M 9 n 3 Mandarin English 22 M 7 n 1.5 Cantonese English 23 F 8 n 4 Cantonese English 24 M 12 y 12 English French See Appendix A for questionnaire used to elicit biographical information. 17 Table 2 SLP Participant Information13 # Sex Age Native Years in LI L2 L3 Home Born? Canada Language 1 F 60+ y >61 English French German English 2 F 51-60 y >51 English English English English, 3 F 20-30 n >18 Gujarati English Gujarati 4 F 51-60 y >51 English English 5 F 20-30 y >20 English English 6 F 41-50 y >41 English English M 41-50 n 10 English English 8 F 31-40 y >31 English French English 9 F 41-50 n 3 English Cantonese Mandarin Cantonese 10 F 41-50 y >41 English English 11 F 51-60 n 26 English English 12 F 41-50 n 22 English English 13 F 51-60 n 26 English English 14 F 41-50 n 26 English English 15 F 51-50 y >50 English English 16 F 20-30 y >20 English Spanish French English 17 F 31-40 y >31 English English 18 F 31-40 y >31 English English 19 F 41-50 y >41 English English 20 F 20-30 y >20 English English 21 F 60+ y >60 English English 22 F 41-50 y >41 English ASL English 23 F 31-40 y >31 English English 'The only male SLP bSee Appendix A for questionnaire used to elicit biographical information. 18 2.2 Procedures 2.2.1 Stimuli Selection and Preparation The stimuli were taken from an existing data set from four students in northern BC aged 7, 8, 11, and 12 (Bernhardt et al., 2007). These students had residual difficulty with 111 and were undergoing ultrasound treatment. Three were male; the only female (NA4) was 12 years old. Two words, "star" and "rabbit" in pre-treatment and post-treatment single-word samples, were selected as stimuli. Words from both pre- and post-treatment conditions were selected in order to create an acoustically diverse sample of 111 productions ranging from frank misarticulations to a normal [J], i.e. closely simulating 111 in treatment. The tokens were trimmed to the syllables I ail and Aiae/ in order to eliminate a lexical effect and reduce the duration of the listening task. The word "rabbit" was cut at the beginning of the vocal signal and at the middle of the vowel. The word "star" was cut at the beginning of the vowel formants' steady state and at the end of the vocal signal. The samples were of field quality and thus amplitude of speech signal and noise floor were considerably variable. Attempts at making amendments by adding noise or boosting signal resulted in poorer quality samples and tended to increase the idiosyncratic nature of each sample and thus, the signals were not modified. Four students by two words by two conditions resulted in 16 stimuli for the experiments. Limiting the number of stimuli was a consideration because the experimental design called for a comparison of every pair-wise permutation of the stimuli: n(n-l)/2 pairs. (This curved function results in a rapid escalation of number of stimuli as stimuli pairs increase.) The goal was to present stimuli for no more than one-half hour to the participants because it was speculated that a longer task would result in boredom and attrition, particularly for the child listeners. 19 2.2.2 Stimuli Acoustics In order to examine the relationship between acoustic variables and perceptual variables, stimuli acoustics were measured by two raters using spectrograms obtained from PRAAT version 4.4.13 (Boersma & Weenink 2006). First, formant height was calculated by measuring an instantaneous slice at the point of greatest constriction between F2 and F3. This resulted in considerable differences between raters, however. In order to improve inter-rater reliability, an average formant height for the steady state of the Ul was obtained, resulting in an average difference of 22Hz between raters. Table 3 (on the next page) shows acoustic values for the stimuli. 2.2.3 Experimental Tasks The experiment involved two tasks: an identification task and a pair-wise comparison task. In the identification task, a category judgment was made for each stimulus. In the comparison task raters chose a better Ul from every pair-wise combination of the 16 stimuli (120 pairs). The basic order of the stimuli within each task was randomized using the RAND function from Microsoft EXCEL v. 10.0.2614.0. As a measure of Intra-Rater Reliability, some stimulus presentations were repeated as follows. For the identification task, the first and last four stimuli from the basic random order were duplicated for a total of eight. For the comparison task, the first and last 10 stimulus presentations from the basic random order were repeated. In order to have measures of the effect of perceiving repetitive stimuli over time, four versions of the experiment were developed. Half of the participants carried out the identification task before the comparison task to control for order effects, the other half carrying out the tasks in the opposite order. The other variation, applied to half of each of the previously mentioned groups, was a bisection and re-ordering of the paired comparison task 20 so that half of the participants received items in the selected order and the other received items 61-120 before items 1-60 (again, to control for order effects). This allowed a comparison of ratings from the first half and the second half of the stimuli presentations. Table 3 Acoustic Data Stimulus FI F2 F3 F3-F2 Duration (sec) Pre-Tx Post-Tx lNA2ar 524 1294 2672 1378 0.367 lNA2ra 474 1275 2745 1470 0.407 lNA4ar 665 1583 2245 662 0.307 lNA4ra 545 1932 2446 514 0.318 lNA5ar 782 1817 2793 976 0.361 lNA5ra 450 1296 3007 1711 0.493 INAlar 437 1328 2530 1202 0.512 INAIra 430 1249 2685 1436 0.287 2NA2ar 639 1190 2196 1006 0.336 2NA2ra 584 1620 2824 1204 0.198 2NA4ar 669 1710 2243 534 0.657 2NA4ra 577 2012 2463 451 0.539 2NA5ar 807 1952 3310 1359 0.482 2NA5ra 557 1872 3121 1249 0.182 2NAlar 598 1530 2639 1110 0.439 2NAlra 488 1519 2608 1090 0.23 Note. The stimulus name format can be interpreted as follows: NA: from a large northern town in British Columbia NAlxx, NA2xx, etc. = speaker number and token, i.e. 'ra' or Aiae/ and 'ar' or An/ IN, etc.: 1 = pre-treatment, 2 = post-treatment The sound files were presented using Microsoft Powerpoint v. 10.0.2623.0 on laptops of various makes and models through SONY and BOSE over the ear headphones. Live 21 verbal instructions for both tasks were given at the beginning of the experiment. In these instructions, extreme examples of poorly articulated Aiae/ and /aj/ syllables were demonstrated by saying /wae/ and /aw/. Task prompts were displayed on each page of the presentation. In the identification task, the participants listened to a stimulus and responded to the prompt "Is this an R?" In the comparison task, participants listened two stimuli and responded to the prompt "Which R is better?". Each sound file was played three times. For the comparison task, alternate stimuli were played in successive pairs three times. The participants were able to advance the slides without listening to each presentation of the sound clips. There were 16 animate GIFs dispersed evenly throughout the presentation in order to provide a short break for the auditory system and in order to motivate the participants. The experiment took anywhere from 30 minutes to 1 hour. See appendix B for a screenshot of the presentation. 2.2.4 Data Analysis Procedures Data were hand counted from score sheets and transferred to a Microsoft Excel spreadsheet. For the identification task, responses were coded as 0, corresponding to a judgment as not Ul, and 1, corresponding to a judgment as Ul. For the paired comparison task, the stimulus selected as representing a better Ul was coded as 1 and the other was coded as 0. For the comparison task, scores resulting from each comparison were totaled to provide a composite score out of 15. 2.2.4.1 Analysis of Ratings Data Most of the statistical analyses were performed with SPSS 12.0 Graduate Student Version. Intra-class correlation coefficients were calculated using a web-based inter-rater 22 reliability calculator (htt4p://www.med-ed-online.org/rating/reliability.hto and confirmed using R, v2.3.1 (2006). R was also used for Multiple Analysis of Variables (MANOVA). In order to determine group differences between SLPs and children, t-tests were performed on comparison and identification scores for each token. To understand possible causes for group differences, acoustics for stimuli on which children and SLPs differed significantly were examined. In order to determine significant acoustic factors for the group, Pearson correlation coefficients were obtained for acoustic variables and group mean scores on both tasks. 2.2.4.2 Inter-rater Reliability Inter-rater reliabilities for both tasks and both groups were determined using the intra-class correlation (Ebel, 1951; Shrout & Fleiss, 1979). Intra-class correlation measures the variability in the means relative to the total variability, expressed as r = A/A+B where A is the variance in the true components in the sample and B is the variance in errors in the samples. In order to answer research questions about how many raters are need for a reliable judgment task, inter-rater reliabilities for differing numbers of raters were determined using an inter-rater reliability web calculator (http://www.med-ed-online.org/rating/reliability.html) and the coefficients for a single rater and for the group mean were confirmed using R. 2.2.4.3 Intra-rater Reliability Intra-rater reliability was determined by calculating mean percent similarity among the eight repeated stimulus presentations for the identification task and the 20 repeated stimulus presentations for the comparison task. In order to determine task and group differences in intra-rater reliability, t-tests were performed on percent similarity scores. 23 2.2.4.4 Listener Variables Listener variables were analyzed in a number of ways. The first was an examination of variability in intra-rater reliability. For the children, listener variables included gender, whether or not the child was born in Canada, age, and time living in Canada. T-tests (a measure of differences between categories) were performed for the listener variables of gender and whether or not the child was born in Canada. Pearson correlations (measurements of relationships between sets of continuous data) were obtained for age and time living in Canada. For the SLPs, listener variables included whether or not the SLP was born in Canada, whether or not the SLP was a parent, age, years of experience and time living in Canada. T-tests were performed for the listener variables of whether or not the SLP was born in Canada and whether or not the SLP was a parent. Pearson correlations were obtained for age, years of experience and time living in Canada. In order to measure differences between children and SLPs* identification scores, t-tests were performed to compare SLPs and children's mean identification scores for each stimulus. Analysis of the comparison task scores by listener variables was conducted using a MANOVA. Listener information about stimuli showing significant differences was examined using descriptive statistics for groups and correlations for continuous factors. Acoustic variables were examined in relation to listener variable effects. Chapter 3: Results 3.1 Analysis of Ratings Data 3.1.1 Identification Task Mean identification scores for the children and SLPs were calculated for each token. Figure 1 (below) displays mean identification scores for the two groups. - - -o--- children Stimulus Figure 1. Mean Identification Score for SLPs and Children For the identification task, an independent samples t-test was performed for the distribution of mean scores in the SLP and child groups. There were no significant differences overall between mean SLP ratings and mean children's ratings (t= -.182, Q= 0.857). SLP mean scores for pre-treatment stimuli were on average lower than the children's. Conversely, SLP mean scores for post-treatment stimuli were, on average, higher than the children's. However, t-tests comparing children and SLPs mean ratings on either pre-treatment or post-treatment sets indicated no significant difference between listener groups (t= -097, p_= 0.924 for pre-treatment and t= -.310, p_= 0.761 for post-treatment.) 25 A t-test was performed on the children's and SLPs' distributions of raw scores for each stimulus. After applying a Bonferroni transformation for significance, setting a at .003, none of the t-tests showed significance. Children's identification scores and the stimulus formant heights were compared using an x,y scatterplot as displayed in Figure 2 (below). 3500 -I 3000 -2500 -^ 2000 -N o 1500 -a 3 CT1 £ 1000 -500 -0 -0 0.2 0.4 0.6 0.8 1 Identification Score Figure 2. Identification Score for Children by Stimulus Formant Height Pearson Correlation coefficients between children's identification scores and acoustic variables were calculated and are shown in Table 4 (on the next page). • • • £ a T~ i i a A Table 4 Correlations for Children's Mean Identification Task and Acoustic Variables FI F2 F3 F3-F2 Duration 0.405 -0.107 -0.524* -0.337 -0.32 0.12 0.692 0.037 0.15 0.226 *E<.05 The SLP identification score and stimulus formant height were compared using an x, scatterplot as displayed in Figure 3 (below). 3500 -3000 -2500 -^ 2000 N X g* 1500 a" £ 1000 500 0 0 0.2 0.4 0.6 0.8 1 Identification Score Figure 3. Identification Score for SLPs by Stimulus Formant Height Pearson correlation E Pearson Correlation coefficients between SLP's identification scores and acoustic variables were calculated and are shown in table 5. Table 5 Correlations for SLP's Mean Identification Task and Acoustic Variables FI F2 F3 F3-F2 Duration Pearson correlation 0.149 -0.274 -0.659 -0.368 0.081 0.581 0.305 0.005 0.161 0.765 *D<.01 3.1.2 Comparison Task Mean comparison scores for the children and SLPs were calculated for each token in the pair-wise comparison. Figure 4 displays mean comparison scores for the two groups. 14 12 10 o o C/3 O 2 4 a <N < z ca fe-< Z -©- - - children -a—S-LPS < Z T|-< z c3 2 < < z z < z < z a KS S3 C3 a CO fe- a T l - i n rn — < <c < < < < < z z z z z z z <N CN <N CM ( N < z Stimulus Figure 4. Mean Comparison Score for SLPs and Children 28 For the comparison task, a t-test was performed for the distribution of mean scores in the SLP and child groups. There was no significant differences overall between mean SLP and children's ratings (t= -.029, p= .977). SLP mean scores for pre-treatment stimuli were lower than the children's. Conversely, SLP mean scores for post-treatment stimuli were higher than the children's. However, t-tests did not show significant differences (t= 0.810, p= .431 for pre-treatment and t= -.642 p= .531 for post-treatment.) T-tests to examine differences between SLP and children's comparison scores were performed for each stimulus. The results are displayed in Table 6. Table 6 T-tests for Comparison Task Scores Stimulus Mean children's score Mean SLP score Absolute difference t E lNA2ar 8.31 5.65 2.66 3.09 0.003 lNA2ra 7.13 9.30 2.18 -2.38 0.022 lNA4ar 11.65 9.57 2.08 3.29 0.002* lNA4ra 6.15 7.61 1.46 -1.68 0.100 lNA5ar 7.52 4.87 2.65 3.40 0.001* lNA5ra 4.40 2.87 1.53 2.42 0.020 INAlar 8.35 6.13 2.22 2.44 0.019 INAIra 6.85 7.30 0.45 -0.58 0.567 2NA2ar 12.06 11.35 0.71 0.93 0.360 2NA2ra 4.50 6.65 2.15 -2.33 0.024 2NA4ar 9.77 11.30 1.53 -1.49 0.141 2NA4ra 4.52 6.30 1.78 -2.05 0.047 2NA5ar 8.00 5.26 2.74 3.21 0.002* 2NA5ra 3.92 7.13 3.21 -4.14 0.000** 2NAlar 11.15 11.04 0.10 0.11 0.915 2NAlra 5.31 7.65 2.34 -2.76 0.008 For repeated t-tests, Bonferroni transformation divides a by n: * p<.003, **p<.001 Children's comparison score and formant height were compared using an x,y scatterplot displayed in Figure 5. 3500 n o 3000 -o o o 2500 - o o o o o . o (Hz) 2000 - • • • a • • o F3 (Hz) Frequencyi 1500 -1000 -• • a° ® • • • F2 A F I A A A 500 ~^~7A" I u o - 1 1 1 3 5 7 9 11 13 Comparison Score Figure 5. Comparison Score for Children by Stimulus Formant Height Pearson Correlation coefficients between children's identification scores and acoustic variables were calculated and are shown in Table 7. Table 7 Correlations for Children's Mean Comparison Task and Acoustic Variables FI F2 F3 F3-F2 Duration Pearson 0.375 -0.341 -0.553* -0.223 0.264 correlation n 0.153 0.196 0.026 0.407 0.323 *p_<.05 SLP comparison scores and formant heights were compared using an x,y scatterplot as displayed in Figure 6. 3500 3000 2500 2000 o 1500 a <u 3 a* 1000 500 A A A A A -^ T " " A ^ " 4 6 8 10 Comparison Score 12 Figure 6. Comparison Score for SLPs by Stimulus Formant Height Pearson Correlation coefficients between SLPS' comparison scores and acoustic variables are shown in Table 8. Table 8 Correlations for SLPs' Mean Comparison Task and Acoustic Variables FI F2 F3 F3-F2 Duration Pearson 0.129 -0.181 -0.678* -0.456 -0.033 correlation E 0.634 0.502 0.004 0.076 0.904 * £ <.01 31 3.2. Intra-Rater Reliability 3.2.1 Identification Task As a measure of intra-rater reliability for the identification task, mean similarity between repeated ratings of eight stimuli was calculated. Mean similarity for children was 0.66 (SD= 0.21) and for the SLPs, 0.80 (SD= 0.14). A t-test was performed on the intra-rater reliabilities for the children and SLP groups. There was a significant difference between groups (t= -2.622, 2=0.012). 3.2.2 Comparison Task Intra-rater reliability for the comparison task was calculated by totaling mean similarity between repeated ratings of 20 stimuli. Mean similarity for children was 0.79 (SD= 0. 09) and for the SLPs, 0.87 (SD= 0.10). A t-test was performed on the intra-rater reliabilities for the children and SLP groups. There was a significant difference between groups (r=-2.791,2=0.008). 3.2.3 Task Comparison A t-test was performed on the SLPs' intra-rater reliability for the identification task and the comparison task (see values above). The comparison task was better than the identification task (t = -2.004, p_=0.051). A t-test was performed on the children's intra-rater reliability for the identification task and the comparison task. The comparison task was significantly better than the identification task (t= -2.771,2=0.008). 3.3 Inter-Rater Reliability Inter-rater reliability was calculated for each task by each group using intra-class correlation. Theoretical inter-rater reliability coefficients for different numbers of raters were calculated and are displayed in Figure 7 and Table 9 on the next page. Generally, the scores 32 from the identification task had lower intra-class correlation coefficients than the composite scores from the comparison task. Children's scores had lower intra-class correlation for the identification task than SLPs, but the two groups showed virtually identical intra-class correlations on the comparison task. - £3 comparison task - children — • — identification task-children - -O— comparison task-S-LPs -••—identification task-S-LPs Figure 7. Inter-Rater Reliabilities By Number of Raters for Tasks and Groups Table 9 Inter-Rater Reliabilities By Number of Raters for Tasks and Groups Number of raters 1 2 3 4 5 10 15 25 Comparison task - children 0.43 0.6 0.69 0.75 0.79 0.88 0.92 0.95 Identification task - children 0.19 0.32 0.41 0.49 0.54 0.7 0.78 0.85 Comparison task - SLPs 0.39 0.57 0.66 0.72 0.77 0.87 0.91 0.94 Identification task - SLPs 0.28 0.44 0.54 0.61 0.66 0.8 0.85 0.91 0 5 10 15 20 25 number of raters 33 3.4 Fatigue or Experience Effect In order to determine whether fatigue or experience was a factor in identification scores, listeners were divided into two groups; those who performed the identification task before the comparison task and those who performed the identification task after the comparison task. T-tests were performed on identification task intra-rater reliabilities for both groups. For the children, there was no significant difference for the two groups (t = .596, p = .557). Also, for the SLPs, there was no significant difference for the two groups (t = -.111, p = .913). 3.5 Listener Variables 3.5.1 Intra-Rater Reliability 3.5.1.1 Children In order to determine whether categorical participant variables were a significant factor for the children, t-tests were performed on intra-rater reliability. They showed no significant difference between native Canadians and non-native Canadians (t= .548, p=.549 for identification task, t= .617, p= .518 for comparison task.) and no significant difference between males and females (t= .237, p=.815 for identification task, t= .217, p= .831 for comparison task.) In order to determine whether continuous listener variables were a significant factor, Pearson correlations were performed on intra-rater reliability. There were no significant correlations between age and intra-rater reliability (r = -.283, p= .180 for the identification task, r= .162, p= .450 for the comparison task), or between time in Canada and intra-rater reliability (r = -.047, p= .828 for the identification task, r= -.115, p= .591 for the comparison task). 3.5.1.2 SLPs In order to determine whether categorical listener variables were a significant factor, t-tests were performed on intra-rater reliability. They showed no significant difference between native Canadians and non-native Canadians (t= .913, p_= .371 for the identification task, t= .203, D= .841 for the comparison task), and no difference between parents and non-parents (t= .337, p_= .740 for the identification task, t= -1.368, rj= .186 for the comparison task). In order to determine whether continuous listener variables were a significant factor, Pearson correlations were performed on intra-rater reliability scores and listener variables. There were significant correlations neither between age and intra-rater reliability (r = .139, p= .526 for the identification task, r= .113, p= .608 for the comparison task) nor between years of experience and intra-rater reliability (r = .298, rj= .168 for the identification task, r= -.041, 2= -854 for the comparison task). 3.5.2 Identification Scores A t-test was performed on mean SLP and children's scores on the identification task. The SLPs scores (mean= .538, SD= .240) were slightly higher than the children's (mean= .554, SD= .281). This difference was not significant (t= -.182, Q= .857). 3.5.3 Comparison Scores For the SLP group and the children group, MANOVAs were performed for the comparison scores to determine the effect of listener variables on comparison scores. MANOVA results are displayed in Tables 10 to 12 on the following pages. 35 Table 10 MANOVA for Children Listener Variables and Comparison Scores Version Gender Age Native Speaker Years in Canada F E F E F E F E F E lNA2ar 1.215 0.336 0.005 0.945 0.326 0.576 3.181 0.093 1.270 0.276 lNA2ra 1.188 0.346 0.006 0.939 0.258 0.618 1.196 0.290 0.232 0.637 lNA4ar 0.744 0.542 0.353 0.561 0.030 0.866 1.398 0.254 0.061 0.808 lNA4ra 0.161 0.921 1.319 0.268 0.343 0.566 0.070 0.748 0.179 0.678 lNA5ar 0.563 0.647 0.250 0.624 1.673 0.214 3.112 0.097 0.882 0.362 lNA5ra 1.062 0.393 0.244 0.628 0.469 0.503 0.404 0.534 1.108 0.308 INAlar 2.040 0.149 0.159 0.695 0.001 0.982 0.277 0.606 0.007 0.934 INAIra 1.581 0.233 2.126 0.164 0.318 0.581 1.572 0.228 0.059 0.811 2NA2ar 1.897 0.171 0.006 0.942 0.023 0.881 0.557 0.466 0.861 0.367 2NA2ra 2.372 0.109 0.821 0.379 0.254 0.621 2.736 0.118 0.022 0.884 2NA4ar 2.697 0.081 0.169 0.687 2.635 0.124 0.247 0.626 0.040 0.845 2NA4ra 0.305 0.821 0.036 0.851 0.518 0.482 0.766 0.395 0.547 0.470 2NA5ar 1.815 0.185 0.068 0.798 3.783 0.070 3.056 0.100 0.121 0.733 2NA5ra 2.647 0.084 0.342 0.567 0.703 0.414 2.600 0.126 0.630 0.439 2NAlar 1.387 0.283 0.025 0.876 0.280 0.604 0.933 0.348 0.064 0.804 2NAlra 2.364 0.110 0.772 0.393 1.308 0.270 1.006 0.331 0.004 0.948 *p<.01, **p<.005 Listener variables were not significant factors in children's comparison scores for any of the stimuli. Table 11 MANOVA for SLP Listener Variables (version, age, native Canadian, language) and Comparison Scores Token Version Age Native Second Language F E F E F E F E lNA2ar 0.643 0.609 0.707 0.609 0.198 0.668 1.007 0.345 lNA2ra 1.943 0.201 0.949 0.484 2.235 0.173 0.834 0.288 lNA4ar 1.329 0.331 0.373 0.822 0.007 0.937 0.504 0.498 lNA4ra 4.039 0.051 3.699 0.055 0.020 0.892 1.743 0.223 lNA5ar 3.801 0.058 1.057 0.437 0.343 0.574 5.242 0.051 lNA5ra 2.923 0.100 0.695 0.616 0.115 0.743 0.480 0.508 INAlar 0.893 0.486 2.423 0.133 0.124 0.734 3.639 0.093 INAIra 0.313 0.816 0.378 0.819 0.048 0.833 0.036 0.854 2NA2ar 1.279 0.246 8.751* 0.005 1.112 0.323 4.798 0.060 2NA2ra 1.283 0.345 1.292 0.350 0.018 0.896 0.404 0.543 2NA4ar 3.792 0.058 2.008 0.186 1.029 0.340 0.925 0.364 2NA4ra 0.285 0.835 0.200 0.932 0.469 0.513 0.167 0.694 2NA5ar 3.275 0.080 0.357 0.833 1.499 0.256 1.050 0.335 2NA5ra 0.108 0.953 0.997 0.462 1.035 0.339 0.043 0.840 2NAlar 11.41 0.003** 0.127 0.968 0.012 0.916 0.258 0.625 2NAlra 0.481 0.704 0.269 0.890 0.577 0.569 0.381 0.554 *p<.01, **p<.005 Listener variable "Version" was shown to be a significant factor in the SLPs' comparison scores for stimulus 2NAlar. Table 12 MANOVA for SLP Listener Variables (parent, years experience with children) and Comparison Scores Token Parent Years Experience with Children F E F E lNA2ar 0.688 0.431 0.001 0.979 lNA2ra 0.896 0.372 0.050 0.829 lNA4ar 0.931 0.363 0.406 0.542 lNA4ra 4.061 0.079 0.066 0.804 lNA5ar 1.311 0.285 2.902 0.127 lNA5ra 0.999 0.347 0.006 0.940 INAlar 0.082 0.782 0.925 0.364 INAIra 0.024 0.880 0.306 0.595 2NA2ar 5.488 0.047 5.103 0.054 2NA2ra 6.320 0.036 0.268 0.619 2NA4ar 0.823 0.391 0.989 0.349 2NA4ra 0.521 0.491 4.314 0.071 2NA5ar 0.869 0.379 0.837 0.387 2NA5ra 4.737 0.061 0.832 0.388 2NAlar 0.001 0.977 0.010 0.921 2NAlra 2.891 0.128 0.120 0.738 38 Mean comparison scores and mean position by presentation version are presented in Table 13. Position is a numerical value equal to the number of stimulus presentations up to and including the stimulus presentation in question. Table 13 Mean Comparison Score and Position for Stimulus 2NAlar by Version Version Mean comparison score Mean position 1 13.5 66 2 13.5 66 3 8.5 66 4 8.2 71 Listener variable "Age" was a significant factor in the SLPs' comparison scores for stimulus 2NA2ar. Figure 8. SLP Age by Mean Versus Mean Comparison Score for 2NA2ar 39 Chapter 4: Discussion 4.1 Goals and Questions for the Study: There were three main questions for the current study (1) How do listener characteristics, such as age, experience with children, experience with English, bilingual status and native speaker status affect auditory perceptual ratings of I ill (2) How do the ratings obtained in a paired-comparison task differ from those obtained in an identification task? (3) How does fatigue or experience affect ratings over a large number of repeated stimulus presentations? Relative to the first question, two listener groups were compared: SLPs and age-matched peers of the children who provided the samples of 111 in treatment. Two stimulus presentation designs were compared: an identification task and a paired comparison task. Factors affecting judgments and their reliability were examined including age, bilingualism (native Canadian status), gender, task order (fatigue), and for the SLPs, years experience working with children (questions 1 and 3). Acoustic data were related to judgment results and variability. The following sections discuss results with respect to those questions. 4.2 Discussion of Specific Results 4.2.1 Listener Groups Regarding the first question about listener groups, the SLPs had better intra-rater reliability than the child group for both tasks. SLPs also had better inter-rater reliability for the identification task than the children. This suggests that the age and maturity difference, and/or the training and experience difference results in better internal consistency in some 40 aspect of both tasks. This aspect may very well be goodness of category judgments or it may be the basic ability to make auditory comparisons. However, for both groups and both tasks, F3 was the only acoustic variable that correlated significantly with ratings, suggesting some commonality in perceptual cue use. Furthermore, there was no group difference in inter-rater reliability for the comparison task. The groups had similar mean ratings for each stimulus on both tasks although the SLPs had higher ratings for post-treatment stimuli and lower ratings for pre-treatment stimuli than the children. Although this difference was not statistically significant, it may imply that SLPs are more able to track progress during treatment or simply have treatment goals that do not reflect peer judgment. Since the trend is non-significant, this question is still unanswered. Further discussion of group comparisons follows below in relation to previous literature, stimuli ratings and listener variables. 4.2.1.1 Ratings of Stimuli As noted, there were no significant differences between the two groups' mean identification scores for the stimuli. This finding (although see the discussion on reliability below) does not lend support to studies that indicate phoneme boundary differences between children and adults (Menyuk & Anderson 1969; Ohde & Sharf, 1988; Slawinski & Fitzgerald, 1998) and SLPs and children (Sharf & Benson, 1983; Chaney, 1988). A number of factors might account for the divergent results. The present study used natural word productions truncated to form syllables with Ul in pre-vocalic and post-vocalic position. Some of the previous studies used natural word stimuli (Chaney, 1988) and some used synthetic spectrum of syllable stimuli that allowed the researchers to calculate phoneme boundaries and determine significant group differences in these category boundaries (Ohde & Sharf, 1988; Sharf & Benson, 1983). Chaney's (1988) 41 word stimuli may have resulted in a lexical effect that may have influenced parents more than SLPs in her study. However, neither the current or past studies have used both natural and synthetic stimuli. Thus, it is not possible to determine whether the stimuli difference accounts for the divergent findings. The use of post-vocalic 111 in the present study may also be an explanatory factor for the lack of demonstrated difference between SLPs and children. If the children were to demonstrate their bias on pre-vocalic 111 only, then the difference might become diluted by post-vocalic 111. A post-hoc analysis rejects this explanation since syllables with pre-vocalic 111 were not rated more often as 111 by children, but actually received a lower rating. Another possible explanation may be in the nature of the stimuli themselves (either from acoustic quality or from production quality by the speakers). If most of the stimuli occurred at the endpoints of the spectrum (being either really good or really poor exemplars), then SLPs and children might easily agree on ratings. This was also not the case, since inter-rater reliability was quite low. A post-hoc analysis indicates that roughly half of the stimuli had under 70% agreement. This suggests that there should have been a sufficient number of ambiguous stimuli on which the SLPs and children could show their different biases. When children's and SLPs' comparison composite scores for each stimulus were compared with a t-test, four stimuli showed a significant difference between groups. Three of the items, lNA4ar, lNA5ar and 2NA5ar, all "ar" syllables, were rated higher by the children whereas the other item, 2NA5ra, a "ra" syllable, was rated most highly by the SLPs. The acoustic nature of these stimuli was diverse and there was no pattern that related to the group difference in scores. A possible factor in this difference is the lexical status of "ar" as well as a confound in the instructions that asks "Which r is better?" and not "Which ra is better?". 42 As noted, the lack of demonstrated group difference between SLPs and children is puzzling and suggests an opportunity for further research to determine whether there are experimental factors that facilitate or eradicate this group difference 4.2.1.2 Listener Variables The MANOVA relating listener variables to comparison scores resulted in some significant factors; however, there were no consistent patterns and each listener variable was a significant factor for no more than 4 out of 16 stimuli. There was no indication that listener variables such as status as being native Canadian, age, years in Canada and gender for the children were factors in comparison scores for the children. Nor was there sufficient evidence to indicate that age, years of experience with children under 12, status as a parent, or status as a native Canadian were factors in comparison scores for the SLPs. The two stimuli that showed significant variance based on dependent variables "version" and "age" were likely chance coincidences. Measurements of mean position for the stimuli that showed variance with respect to "version" (2NAlar) were quite similar: 66 and 66 for the versions in which it was highly rated (13.5 and 13.5), and 67 and 71 for the versions in which it was lowly rated (8.5 and 8.2). Although the mean position for the lower rated stimuli was higher, it was only a mean difference of 1 and 5 stimuli out of 120. For the stimuli that showed a variance with respect to age, 2NA2ar, the scores seem to decrease as age increases. The acoustic variables for this stimulus indicate a good 111 with the lowest F3 of all the stimuli and a lower than average F3-F3 value. There does not seem to be any explanation beyond chance for the relationship between age and comparison score. Correlations and t-tests were applied to mean intra-rater reliability with respect to listener variables for both tasks and both groups. Again, no significant difference was found for any of the previously mentioned listener variables. 43 Children who were not native speakers of English did not perform differently from those who did on either task. Of the eight listeners that were not born in Canada, all listed either Mandarin or Cantonese as their other primary language; Cantonese does not have Ul in its phonemic inventory, but some dialects of Mandarin have a post-vocalic Ul. 4.2.2 Task Comparison Relative to the second question, stimuli ratings were similar for both tasks, indicating convergent validity for the two tasks. Both tasks also showed negative correlations with the accepted acoustic marker, F3 height (Espy-Wilson et al., 2000). This lends further support to construct validity of both judgment tasks. There were some differences in that, for both groups, intra-rater reliability was better for the comparison task than for the identification task. An intra-class correlation provided inter-rater reliability trends for different numbers of raters, showing inter-rater reliability to also be higher for the comparison task than for the identification task. This extends the notion suggested by Yiu, Chan & Mok (2007) that external anchors can improve reliability. It also suggests that category boundaries are less stable factor than exemplars or prototypes and may in fact be theoretical and psychological artifacts since the comparison task relied on relative goodness of category judgments and the identification task relied on phoneme boundary judgments. 4.2.3 Fatigue or Experience Effect Half of the participants carried out the relatively short identification task before the relatively long comparison task and half carried the two tasks out in the other order. There was no significant difference between the identification task intra-rater reliabilities of these two groups for either the SLPs or the children. This confirms the research hypothesis that the 44 SLPs would not suffer from fatigue effects and rejects the research hypothesis that the children would suffer from a boredom effect and become less reliable as a result. 4.3 Limitations to the Study There were a number of aspects of the study that may have introduced or unwanted factors into the data. First of all, the presentation was long. Many of the children asked when the task would end during the presentation. This length may have introduced fatigue, which may have lowered reliability of scores (although statistical comparisons showed no obvious fatigue). Secondly, syllables were used instead of words. The lexical status of the two syllables, /cu/ and lixl is different, /cu/ being the surface form of the word "are" and lisel having no lexical status. Presumably, activation of the word "are" in the lexicon could result in a back-channel activation of the phonemes in /cu/, resulting in an increased activation of phoneme representation of 111. Another possible confound resulting from these syllables is that instructions asking participants to rate an "r" gives preference to /cu/. The data suggest that /cu/ was preferred over lixl. However, since there are equal number of /cu/ and Aiae/ tokens distributed across tasks and pre-treatment, post-treatment, this does not have any implications for the main findings of the study. In terms of the stimuli themselves, although the syllables somewhat factored out lexical effects, the syllables had to be sliced from the words, resulting in unnatural sounding onsets or offsets where the word was cut. Furthermore, the recordings were field quality and noisy. The noise may have introduced uncertainty into formant measurements and may have increased variability in ratings. However, part of the rationale was to use field quality 45 recordings because everyday conversation is noisy, and valid results with noisy recordings are more generalizable to daily situations. 4.4 Implications of the Study 4.4.1 Research Implications This research study was concerned with design factors regarding the general question: what makes a reliable Ul judgment? There are a few answers to this question that can be useful for future research. The comparison task was shown to have better inter-rater reliability than the identification task. This suggests that future research might use external anchors in order to improve reliability. Although the comparison task does not identify a phoneme boundary, it is capable of more reliably showing a treatment effect. A significant limitation to the comparison task is the length of time required for the rating of every pair-wise permutation of stimuli. In the present study, number of stimuli was limited to 16 in order to keep experiment time at roughly half an hour. Other tasks that use external anchors might be investigated as a way of increasing reliability. One research issue that could be addressed in more detail is acoustic correlates of perceptual differences. In the present study, syllable duration and static measurements of the first three formants were used. Only F3 was shown to have a significant effect on perception but there were differences in stimuli ranking by the two listener groups that could not be explained acoustically. There was also a considerable amount of variability within groups that might be explained through more acoustic measures. More detailed acoustic measures might include dynamic measures of F2, or static and/or dynamic measures of F4 and F5 (as have recently been found useful in characterizing variants of Ul by, for example Zhou et al., 2007). 46 4.4.2 Theoretical Implications Differences in variability were observed between SLPs and children. For the comparison task, a task that uses no external anchor, SLPs and children had similar inter-rater reliability. However, for the identification task, SLPs had better inter-rater reliability. This indicates that for the SLPs, some aspect of internal representation of Ul is more uniform across individuals. This aspect is likely the phoneme boundary. Whether this difference is related to the age difference, phonetic training, or experience in treatment of Ul could not be determined by this study. What was apparent was that a relative comparison of phonemes in relation to internal representations is a more reliable task than determining whether phonemes adequately match internal representations. The implication is that the determination of adequacy is an individual matter. Prototype theories of speech perception do not offer much explanation of adequacy of fit judgments since they only describe an ideal or prototypical representation. Exemplar models may have more explanatory power for adequacy of fit judgments since they deal with a range of internal representations, some of which may be near phoneme boundaries and may help to define it. Possibly, SLPs have more stored exemplars near the phoneme boundary, allowing them to make better judgments. 4.4.3 Clinical Implications Clinicians need to be reliable in their ratings so that they correctly determine whether a treatment goal has been reached. This study suggests that individual clinicians by themselves may not have good enough inter-rater reliability to make consistently good judgments of Ul. Ways of improving reliability include using a comparison task and using more than one rater. One way of supplementing a single rater's perceptual judgments would be to examine F3 data. Spectrographic analysis could be used as an objective measure of 47 progress and as a biofeedback device. Published norms (i.e. Lee et al., 1999) could be used as a reference for setting targets. 4.5 Conclusion Making phoneme category decisions is a difficult task. These decisions are subject to intra-rater and inter-rater variability. On an phoneme identification task for hi, SLPs had significantly better reliability than children. On a paired comparison task for hi, SLPs and children both had better reliability than for the phoneme identification task. The clinical implication is that reliability of perception can be improved by task design and by using an objective measurement. Theoretically, these findings suggest that category boundaries are less uniform across individuals than internal representations as described in exemplar or prototype theories of speech perception. 48 References Adler-Bock, M., Bernhardt, B., Gick, B., & Bacsfalvi, P. 2007. The use of ultrasound in remediation of Isl in adolescents. American Journal of Speech-Language Pathology, 16(2), 128-139. Bernhardt, B.M., Bacsfalvi, P., Adler-Bock, M., Shimizu, R., Cheney, A., Giesbrecht, N., O'Connell, M., Sirianni, J. & Radanov, B. (2007). Ultrasound as visual feedback: Consultative use in rural areas. Unpublished Manuscript. Bernhardt, B., Bacsfalvi, P., Gick, B., Radanov, B., & Williams, R. (2005). Exploring electropalatography and ultrasound in speech habilitation. Journal of Speech-Language Pathology and Audiology, 29, 169-182. Bernhardt, B., & Stemberger, J. (1998). Handbook of phonological development. From the perspective of constraint based nonlinear phonology. San Diego, CA: Academic Press. Boersma, P., & Weenink, D. (2006). Praat. (Version 4.4.13) [computer software]. Amsterdam, The Netherlands: University of Amsterdam. Brunk, H.D. (1960). Mathematical models for ranking from paired Comparisons. Journal of the American Statistical Association. 55. 291. 503 - 520. Chaney, C. (1988). Identification of correct and misarticulated semivowels. Journal of Speech and Hearing Disorders, 53. 252-261. Chignell, M.H. & Patty, B.W. (1987). Unidimensional scaling with efficient ranking methods. Psychological Bulletin. 101. 2. 304-311. Clark, C. E., Schwarz, I. E. & Blakely, R. W. (1993). The removable R-appliance as a practice device to facilitate correct production of lxl. American Journal of Speech Language Pathology. 2. 84-92. Ebel, R. (1951). Estimation of reliability of ratings. Psvchometrika. 15. 407 - 424. Espy-Wilson, CY. (1992). Acoustic measures for linguistic features distinguishing the semivowels /w j r 1/ in American English. Journal of the Acoustical Society of America. 92. 736 - 757. Espy-Wilson, CY. & Boyce, S.E. (1999). A simple tube model for the American English Ixl. Proceeding of the XlVth International Congress of Phonetic Sciences. J.J. Ohala, Y. Hasegawa, M. Ohala, D Granville, & A.C Bailey (Eds.). Berkely, CA: University of California, Berkely. 2137 - 2140. Espy-Wilson, C.Y., Boyce, S.E., Jackson, M., Narayanan, S. & Alwan, A. (2000). Acoustic modelling of American English Ixl. Journal of the Acoustical Society of America. 108, 343 - 356. Flipsen, P., Shriberg, L.D., Weismer, G., Karlsson, H.B. & McSweeney, J.L. (2001). Acoustic phenotypes for speech-genetic studies: reference data for residual Is-/ distortions. Clinical Linguistics and Phonetics, 15, 603 - 630. Gerrits, E. & Schouten, M.E.H. (2004). Categorical perception depends On the discrimination task. Perception & Psychophvsics, 66, 363 - 376. Gick, B. & Campbell, F. (2003). Intergestural timing in English Ixl. In M. J. Sole, D. Recasens, & J. Romero (Eds.) Proceedings of the XVth International Congress of Phonetic Sciences, Barcelona, Spain, (pp 1911-1914). Barcelona: Universitat Autonoma de Barcelona. Ganong III, W.F. (1980). Phonetic categorization in auditory word representation. Journal of Experimental Psychology: Human Perception and Performance, 6,1,110 -125. Hall, B.J.C. (1991). Attitudes of fourth and sixth graders toward peers with mild articulation disorders. Language, Speech, and Hearing Service in Schools, 22, 334-340. Hashi, M., Honda, K., and Westbury, J. (2003). Time-varying acoustic and articulatory characteristics of American English [r]: a cross-speaker study. Journal of Phonetics. 31,3-22. Hoffman, P.R., Daniloff, R.G., Bengoa, D., & Schuckers, G.H. (1985). Misarticulating and normally articulating children's identification and discrimination of synthetic [J] and [w]. Journal of Speech and Hearing Disorders, 50, 46 - 53. Hoffman, P.R., Stager, S. & Daniloff, R.G. (1983). Perception and production of misarticulated lxl. Journal of Speech and Hearing Disorders, 48, 210-215. Huer, M.B. (1989). Acoustic tracking of articulation errors. Journal of Speech and Hearing Disorders, 54, 530-534. Inter-rater Reliability Calculator Retrieved May 28th, 2007 from: Medical Education Online's Inter-rater Reliability Calculator Web site: http://www.med-ed-online.org/rating/reliability.html. Iverson, P. & Kuhl, P. K. (1995). Mapping the perceptual magnet effect for speech using signal detection theory and multidimensional scaling. Journal of the Acoustical Society of America, 97, 553-562. Kuhl, P. K. (1993). Innate predispositions and the effects of experience in speech perception: The native language magnet theory. In B. deBoysson-Bardies, S. de Schonen, P. Jusczyk, P. McNeilage & J. Morton (Eds.), Developmental neurocognition: Speech and face processing in the first year of life (pp. 259-274). Dordrecht, Netherlands: Kluwer Academic Publishers. Lee, S., Potamianos, A., & Narayanan, S. (1999). Acoustics of children's speech: developmental changes of temporal and spectral parameters. Journal of the Acoustical Society of America, 105, 1455 - 1468. 51 Liberman, A.F., Harris, K.S., Hoffman, H.S., & Griffith, B.C. (1957). The discrimination of speech sounds within and across phoneme boundaries. Journal of Experimental Psychology. 54. 5. 358 - 68. McMurray, B. & Aslin, R.N. (2005) Infants are sensitive to within-category variation in speech perception. Cognition. 95. B15-B26. Massaro, D. (1992). Broadening the domain of the fuzzy logical model of perception. In H. L. Pick Jr., P. van den Broek, & D. C. Knill (Eds.), Cognition: Conceptual and methodological issues (pp. 51 - 84). Washington, DC: American Psychological Association. Masterson, J. & Berhardt, B. (2001), Computerized Articulation and Phonology Evaluation System CAPES. San Antonio, TX: The Psychological Corporation. Menyuk, P. & Anderson, S. (1969). Children's identification and reproduction of Av/, Ul, and III. Journal of Speech and Hearing Research 12. 39 - 52. Modha, G., Bernhardt, B., Church, R. & Bacsfalvi, P. (in press). A case study using ultrasound to treat English lxl. The International Journal of Language and Communication Disorders. Ohde, R. N. & Sharf, D.J.(1988). Perceptual categorization and consistency of synthesized / J - W / continua by adults, normal children and /j/-misarticulating children. Journal of Speech and Hearing Research. 31. 556 - 568. Pollack, A. (2006). Doctors researching drugs to combat stuttering. Retrieved September 20, 2006, from Worcester Telegram and Gazette News. Website: http://www. telegram.com/apps/pbcs.dll/article?AID=/20060918/NEWS/609180332/1012 Ruscello, D.M. (1995) Visual feedback in treatment of residual phonological disorders. Journal of Communication Disorders. 28. 279 - 302. Schouten, B., Gerrits, E., & van Hessen, A. (2003). The end of categorical perception as we know it. Speech Communication, 41, 71 - 80. Sharf, D.J. & Benson, P.J. (1982). Identification of synthesized /r-w/ continua for adult and child speakers. Journal of the Acoustical Society of America. 71,4 1008 - 1015. Sharf, D. J. & Benson, P.J. (1983). Comparison of speech-language pathologists' and naive listeners' identification of synthesized Ai-w/ continua. Journal of Speech and Hearing Research. 26. 525-530. Sharf, D.J., Ohde, R.N. & Lehman, M.E. (1988). Relationship between the discrimination of / W - J / and /t-d/ continua and the identification of distorted / J / . Journal of Speech and Hearing Research. 31, 193 - 206. Shelton, R.L., Johnson, A., & Arndt, W.B.(1974). Variability in judgments of articulation when observer listens repeatedly to the same phone. Perceptual and Motor Skills, 39, 327-332. Shrout, P.E. & Fleiss, J.L. (1979). Intraclass correlation: uses in assessing rater reliability. Psychological Bulletin. 86. 420 - 428. Shriberg, L., Flipsen, P., Karlsson, H., & McSweeny, J. (2001). Acoustic phenotypes for speechgenetics studies: An acoustic marker for residual Ul distortions. Clinical Linguistics & Phonetics, 15, 631-650. Shuster, L.I. (1998). The perception of correctly and incorrectly produced Ul. Journal of Speech, Language & Hearing Research, 41, 941-950. Shuster L., Ruscello, D. and Toth, A. (1995). The use of visual feedback to elicit correct Ul. American Journal of Speech-Language Pathology, 4, 37-44. Shuster, L., Ruscello, D. and Smith, K. (1992).voking/r/ using visual feedback. American Journal of Speech-Language Pathology. 1. 29-34. Silverman, F.H. & Paulus, P.G. (1989). Peer reactions to teenagers who substitute /w/ for hi. Language Speech and Hearing Services in Schools. 20. 219 - 221. Slawinski, E. & Fitzgerald, L.K. (1998). Perceptual development of the categorization of the /j-w/ contrast in normal children. Journal of Phonetics. 26, 27-43. Warren, R. M., & Warren R. P. (1970). Auditory illusions and confusions. Scientific American. 223. 30-36. Werker, J.F., & Desjardins, R.N. (1995). Listening to seech in the 1st yar of lfe: Experiential Influences on poneme prception. Current Directions in Psychological Science, 4,3, 76 -81. Werker, J.F., & Lalonde, CE. 1988) Cross-language speech perception: Initial capabilities and developmental change. Developmental Psychology, 24. 5, 672 - 693. Wolfe, V. Martin, D., Borton, M. & Youngblood, H.C. (2003) The effects of clinical training on cue trading for the /r-w/ contrast. American Journal of Speech Language Pathology, 12. 221 - 228. World Health Organization International Classification of Functioning, Ability and Health. Retrieved October 18, 2006, from ttp://www3.who.int/icf/icftemplate.cfm. Yiu, E.M.L., Chan, K.M.K. & Mok, R.S.M. (2007). Reliability and confidence in using a paired comparison paradigm in perceptual voice quality evaluation. Clinical Linguistics and Phonetics, 21, 129 - 145. Zhou, X., Espy-Wilson, C , Tiede, M. and Boyce, S. (2007). Acoustic cues of "retroflex" and "bunched" American English rhotic sound. Journal of Acoustical Society of America. 121(5), Pt.2, 3168. 54 Appendix A: Questionnaire-Demographics Adults Slide 1 Please take a moment to give us a little information about yourself: 1. 2. 3. 4. 5. I a m f e m a l e * I am male I am 2 0 - 3 0 31-40 4 1 - 5 0 51-60 60+ yea rs o ld r r v I was b o r n in Canada. Y E S " N O -I came to Canada years ago. I speak: English A t home we speak: English Adults Slide 2 1. I l ive in Richmond (c i ty , count ry ) W o r k se t t i ng | school District N O 2. J o b t i t l e [SLP 3 I am a pa ren t 4, I have PH ch i l d ren , a g e / g e n d e r | 1 1 / M 5 I have 1 2 3 j yea rs e x p e r i e n c e w i th k ids under 12. ' C u r r e n t l y - o r - , T h i s many yea rs ago [ Adults Slide 3 1. Are you a parent o f a ch i l d who has , o r has had d i f f i c u l t i e s pronouncing yfs R NO •* sounds a f t e r k inde rga r ten? 2 . 3 . No -> I am I was bo rn in Canada. YES N O Have you ever had any d i f f i c u l t i e s w i th speech sounds? j YES N O - | How is your hear ing? , N o p r o b | e m s Undiagnosed p rob lem Diagnosed p rob lem Have you had you hear ing c h e c k e d , and how long ago s ince t h e check -up? I would l ike t o have my hear ing c h e c k e d please Children Slide 1 I I am a boy r I am a gi r l y e a r s old. 4 I came to Canada I y e a r s ago. 5 I speak I 6 M y mom speaks-. M y dad speaks-Appendix B: Stimulus Presentation Slides Distracter slide example - no sound played Identification Task: Is this a n R ? Y E S '- N O r Comparison Task: Which R is better ? X X X ; •<$ '"<: r This one r This one 58 Appendix C: Ethics Form UBC The University of British Columbia Office of Research Services Behavioural Research Ethics Board Suite 102, 6190 Agronomy Road, Vancouver, B.C. V6T1Z3 CERTIFICATE OF APPROVAL - FULL BOARD PRINCIPAL INVESTIGATOR: Barbara M. Bernhardt INSTITUTION / DEPARTMENT: UBC/Medicine, Faculty of/Audiology & Speech Sciences UBC BREB NUMBER: H06-03610 INSTITUTION(S) WHERE RESEARCH WILL BE CARRIED OUT: Institution 1 SIta UBC Point Grey Site Other location* wher* th« research «H1I be conducted: James Mather Building, University of British Columbia, Rooms 222,221,223 Schools in the Lower Mainland (after receiving permission from the school boards, local schools), YMCA Child Care after-school program which takes place in the local schools (i.e., approval from both the school and the after-school program is being sought once conditional approval has been obtained from BREB) CO-INVESTIGATOR(S): BoskoRadanov Benjamin Perry SPONSORING AGENCIES: N/A PROJECT TITLE: REB MEETING DATE: November 23.2006 CERTIFICATE EXPIRY DATE: November 23,2007 DOCUMENTS INCLUDED IN THIS APPROVAL: DATE APPROVED: January 8.2007 Consent Forms: Teacherconsent SLP Consent Form Parent consent Assent Forms: Child assent Advertisements: Version 2 December 11,2006 Version 2 December 11,2006 Version 2 December 11,2008 Version 2 December 11,2006 AdSLP Adchildparent Adteacher Letter of Initial Contact: (Initial Contact to Schools Initial Contact to YMCA Child Care Program, Vancouver Version 2 Versioin 2 Version 2 Version 1 Version 1 December 11,2006 December 11, 2006 December 11,2006 November 9, 2006 December 11,2006 The application for ethical review and the documents) listed above have been reviewed and the procedures were) Found to be acceptable on ethical grounds for research involving human subjects. Approval Is issued on behalf of the Behavioural Research Ethics Board and signed electronically by one of the following: Dr. Peter Suedfeld, Chair Dr. Jim Rupert, Associate Chair Dr. Arminee Kazanjian, Associate Chair Dr. M. Judith Lynam, Associate Chair 

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.831.1-0101017/manifest

Comment

Related Items