UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

The effect of task difficulty on speech convergence Abel, Jennifer Colleen 2015

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2015_september_abel_jennifer.pdf [ 3.52MB ]
Metadata
JSON: 24-1.0166308.json
JSON-LD: 24-1.0166308-ld.json
RDF/XML (Pretty): 24-1.0166308-rdf.xml
RDF/JSON: 24-1.0166308-rdf.json
Turtle: 24-1.0166308-turtle.txt
N-Triples: 24-1.0166308-rdf-ntriples.txt
Original Record: 24-1.0166308-source.json
Full Text
24-1.0166308-fulltext.txt
Citation
24-1.0166308.ris

Full Text

The effect of task difficulty on speech convergence by Jennifer Colleen Abel  B.A., Queen’s University, 1999 M.A., University of Calgary, 2003  A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF  DOCTOR OF PHILOSOPHY in THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES (Linguistics)  THE UNIVERSITY OF BRITISH COLUMBIA (Vancouver)  June 2015  © Jennifer Colleen Abel, 2015 ii  Abstract  Speech convergence is the tendency of talkers to become more similar to someone they are listening or talking to, whether that person is a conversational partner or merely a voice heard repeating words. The cause of this phenomenon is unknown: it may be related to a general link between perception and behaviour (Dijksterhuis & Bargh, 2001), a coupling between speech production and speech perception systems (Pickering & Garrod, 2013), or an effort to minimize social distance between interlocutors (Giles et al., 1991). How convergence is facilitated or inhibited by various factors (e.g., gender, dialect, level of attention) can help pinpoint the reasons behind it. One as-yet unexamined factor in this regard is cognitive workload, i.e., the information processing load a person experiences when performing a task. The harder the task, the greater the cognitive workload. This study examines the effect of different levels of task difficulty on speech convergence within dyads collaborating on a task. Dyad members had to build identical LEGO® constructions without being able to see each other’s construction, and with each member having half of the instructions required to complete the construction. Three levels of task difficulty were created, with five dyads at each level (30 participants total). Listeners (n = 62) who heard pairs of utterances from each dyad judged convergence to be occurring in the Easy condition and to a lesser extent in the Medium condition, but not in the Hard condition. Acoustic similarity analyses of the same utterance pairs using amplitude envelopes and mel-frequency cepstral coefficients showed convergence on the part of some dyads but divergence on the part of others, with no clear effect of difficulty. Speech rate and pausing behaviour, both of which can demonstrate convergence (e.g., Pardo et al., 2013a) and be affected by workload (e.g., Lively et al., 1993; Khawaja, 2010), also showed both convergence and divergence, with difficulty possibly playing a role. The results suggest that difficulty affects speech convergence, but that it may do so iii  differently for different talkers. Factors such as whether talkers are giving or receiving instructions also seem to interact with difficulty in affecting convergence.   iv  Preface  This research project was conceived by Jennifer Abel, and designed by Jennifer Abel with assistance from Molly Babel, Carla Hudson Kam, and Eric Vatikiotis-Bateson. Data collection and preparation (including transcription of audio files) was performed by Jennifer Abel. Statistical analyses performed by Jennifer Abel and Molly Babel, with assistance from Carla Hudson Kam. Written by Jennifer Abel. The research in this dissertation was conducted under the auspices of UBC Behavioural Research Ethics Board certificate number H12-02235. v  Table of Contents Abstract .......................................................................................................................................... ii Preface ........................................................................................................................................... iv Table of Contents ...........................................................................................................................v List of Tables ................................................................................................................................ ix List of Figures .............................................................................................................................. xii Acknowledgements .................................................................................................................... xvi Chapter 1: Introduction ................................................................................................................1 1.1 The question at hand ....................................................................................................... 1 1.2 Convergence behaviour in speech .................................................................................. 2 1.2.1 Proposed causes of speech convergence ..................................................................... 5 1.2.1.1 A general link between perception and behaviour .............................................. 5 1.2.1.2 A coupling between speech production and speech perception .......................... 7 1.2.1.3 An accommodation of one’s communicative behaviour to that of others .......... 9 1.2.2 Factors affecting convergence .................................................................................. 10 1.3 Task difficulty and speech behaviour ........................................................................... 13 1.3.1 Speech production under cognitive workload........................................................... 13 1.3.2 Speech perception under cognitive workload ........................................................... 15 1.3.3 Proposed causes of workload-based effects on speech behaviour ............................ 16 1.4 Does task difficulty affect speech convergence? .......................................................... 18 1.4.1 Potential interactions of speech convergence and task difficulty, and possible theoretical explanations ........................................................................................................ 18 1.4.1.1 Task difficulty leads to reduced speech convergence ....................................... 19 1.4.1.2 Task difficulty leads to greater speech convergence ........................................ 21 1.4.1.3 Task difficulty shows mixed effects on speech convergence ........................... 22 1.4.1.4 Task difficulty shows no effect on speech convergence ................................... 23 1.4.2 Methodological implications of task difficulty for studying speech convergence ... 23 1.5 The current study: Testing the effect of task difficulty on speech convergence .......... 26 1.6 Outline of the dissertation ............................................................................................. 30 Chapter 2: Corpus collection ......................................................................................................32 vi  2.1 Introduction ................................................................................................................... 32 2.2 Methods......................................................................................................................... 34 2.2.1 Construction task design ........................................................................................... 34 2.2.2 Participants ................................................................................................................ 41 2.2.3 Personality and cognitive measures collection ......................................................... 42 2.2.4 Construction task ...................................................................................................... 44 2.2.5 Preparation of participant recordings ........................................................................ 48 2.3 Results ........................................................................................................................... 50 2.3.1 Corpus statistics: time and error rates ....................................................................... 50 2.3.2 Corpus statistics: gross speaking time ...................................................................... 54 2.3.3 Samples of conversations .......................................................................................... 57 2.4 Summary ....................................................................................................................... 60 Chapter 3: Task difficulty and perceived convergence ............................................................61 3.1 Introduction ................................................................................................................... 61 3.2 Methods......................................................................................................................... 62 3.2.1 Materials: phrase selection ........................................................................................ 62 3.2.2 Procedure .................................................................................................................. 64 3.2.3 Participants ................................................................................................................ 66 3.3 Results ........................................................................................................................... 66 3.4 Discussion ..................................................................................................................... 70 Chapter 4: Task difficulty and global acoustic measures of convergence ..............................75 4.1 Introduction ................................................................................................................... 75 4.2 Methods......................................................................................................................... 77 4.2.1 Materials ................................................................................................................... 77 4.2.2 Procedures ................................................................................................................. 77 4.3 Results ........................................................................................................................... 79 4.3.1 Amplitude envelope measurement results ................................................................ 79 4.3.2 MFCC results ............................................................................................................ 82 4.4 Discussion ..................................................................................................................... 84 Chapter 5: Task difficulty and speech rate convergence .........................................................90 vii  5.1 Introduction ................................................................................................................... 90 5.2 Methods......................................................................................................................... 93 5.2.1 Measuring speaking and articulation rate differences .............................................. 93 5.2.2 Measuring cross-correlation in speech and articulation rates ................................... 98 5.3 Results ........................................................................................................................... 99 5.3.1 Overall speaking and articulation rates ..................................................................... 99 5.3.1.1 Words per second .............................................................................................. 99 5.3.1.2 Syllables per second ........................................................................................ 101 5.3.2 Speaking and articulation rate differences between partners .................................. 104 5.3.2.1 Speaking and articulation rate differences in each step .................................. 104 5.3.2.2 Speaking and articulation rate differences in Giving steps ............................. 112 5.3.2.3 Speaking and articulation rate differences in Receiving steps ........................ 119 5.3.3 Time-series cross-correlation of speaking and articulation rates ............................ 128 5.3.3.1 Words per second speaking and articulation rate cross-correlation................ 128 5.3.3.2 Syllables per second speaking and articulation rate cross-correlation ............ 131 5.4 Discussion ................................................................................................................... 133 Chapter 6: Task difficulty and pausing convergence .............................................................140 6.1 Introduction ................................................................................................................. 140 6.2 Methods....................................................................................................................... 145 6.3 Results ......................................................................................................................... 149 6.3.1 Overall pausing differences .................................................................................... 149 6.3.1.1 Silent pauses.................................................................................................... 149 6.3.1.2 Filled pauses.................................................................................................... 151 6.3.2 Pausing differences between partners ..................................................................... 154 6.3.2.1 Pausing differences in each step ..................................................................... 155 6.3.2.2 Pausing differences in Giving steps ................................................................ 161 6.3.2.3 Pausing differences in Receiving steps ........................................................... 168 6.4 Discussion ................................................................................................................... 174 Chapter 7: General discussion ..................................................................................................180 7.1 Summary of the study ................................................................................................. 180 viii  7.2 General discussion ...................................................................................................... 183 References ...................................................................................................................................191 Appendices ..................................................................................................................................204 Appendix A : LEGO construction designs ............................................................................. 204 A.1 Easy design ............................................................................................................. 204 A.2 Medium design........................................................................................................ 209 A.3 Hard design ............................................................................................................. 212 A.4 Inventory of LEGO pieces used .............................................................................. 215 Appendix B : Participant results on personality and cognitive measures ............................... 218 B.1 Big Five and Autism Spectrum Quotient (AQ) scores ........................................... 218 B.2 RSPAN and mental rotation scores......................................................................... 220 Appendix C : Language background information form for all participants ........................... 222 Appendix D : Post-task questionnaire for construction task ................................................... 223 Appendix E : Stimuli for perception task and acoustic similarity analyses ............................ 229 E.1 Easy condition stimuli............................................................................................. 229 E.2 Medium condition stimuli ....................................................................................... 231 E.3 Hard condition stimuli ............................................................................................ 233 Appendix F : Syllabification of words not in the CMU Pronouncing Dictionary .................. 235  ix  List of Tables Table 2.1 Design information for the LEGO construction task .................................................... 37 Table 2.2 Completion time statistics for each condition (standard deviations in parentheses) .... 51 Table 2.3 Error rates for each condition ....................................................................................... 52 Table 2.4 Completion time and error information for each dyad. ................................................ 53 Table 2.5 Speaking time in Giving steps by Condition (standard deviation in parentheses) ....... 55 Table 2.6 Speaking time in Receiving steps by Condition (standard deviation in parentheses) .. 55 Table 5.1 Mean absolute words per second speaking rate differences by step in Early and Late portions of the conversations (standard deviations in parentheses) ............................................ 104 Table 5.2 Mean absolute words per second articulation rate differences by step in Early and Late portions of the conversations (standard deviations in parentheses) ............................................ 106 Table 5.3 Mean absolute syllables per second speaking rate differences by step in Early and Late portions of the conversations (standard deviations in parentheses) ............................................ 108 Table 5.4 Mean absolute syllables per second articulation rate differences by step in Early and Late portions of the conversations (standard deviations in parentheses) .................................... 110 Table 5.5 Mean absolute words per second speaking rate differences by step in Early and Late portions of the conversations, Giving steps only (standard deviations in parentheses).............. 112 Table 5.6 Mean absolute words per second articulation rate differences in Early and Late portions of the conversations, Giving steps only (standard deviations in parentheses).............. 114 Table 5.7 Mean absolute syllables per second speaking rate difference in Early and Late portions of the conversations, Giving steps only (standard deviations in parentheses) ............................ 116 Table 5.8 Mean absolute syllables per second articulation rate difference in Early and Late portions of the conversations, Giving steps only (standard deviations in parentheses).............. 118 Table 5.9 Mean absolute words per second speaking rate differences in Early and Late portions of the conversation, Receiving steps only (standard deviations in parentheses) ........................ 120 Table 5.10 Mean absolute words per second articulation rate differences in Early and Late portions of the conversation, Receiving steps only (standard deviations in parentheses) .......... 122 Table 5.11 Mean absolute syllables per second speaking rate differences in Early and Late portions of the conversation, Receiving steps only (standard deviations in parentheses) .......... 124 x  Table 5.12 Mean absolute syllables per second articulation rate differences in Early and Late portions of the conversation, Receiving steps only (standard deviations in parentheses) .......... 126 Table 5.13 Time-series cross-correlation results for words per second speaking rates. ............. 129 Table 5.14 Time-series cross-correlation results for words per second articulation rates. ......... 130 Table 5.15 Time-series cross-correlation results for syllables per second speaking rates. ......... 131 Table 5.16 Time-series cross-correlation results for syllables per second articulation rates. ..... 132 Table 6.1 Mean absolute silent pause per second differences by step (standard deviations in parentheses)................................................................................................................................. 155 Table 6.2 Mean absolute silent pause percentage differences by step (standard deviations in parentheses)................................................................................................................................. 157 Table 6.3 Mean absolute filled pause per second differences by step (standard deviations in parentheses)................................................................................................................................. 159 Table 6.4 Mean absolute filled pause percentage differences by step (standard deviations in parentheses)................................................................................................................................. 160 Table 6.5 Mean absolute silent pause per second differences, Giving steps only (standard deviations in parentheses) ........................................................................................................... 162 Table 6.6 Mean absolute silent pause percentage differences, Giving steps only (standard deviations in parentheses) ........................................................................................................... 164 Table 6.7 Mean absolute filled pause rate differences, Giving steps only (standard deviations in parentheses)................................................................................................................................. 165 Table 6.8 Mean absolute filled pause percentage differences, Giving steps only (standard deviations in parentheses) ........................................................................................................... 166 Table 6.9 Mean absolute silent pause per second differences, Receiving steps only (standard deviations in parentheses) ........................................................................................................... 168 Table 6.10 Mean absolute silent pause percentage differences, Receiving steps only (standard deviations in parentheses) ........................................................................................................... 170 Table 6.11 Mean absolute filled pause per second differences, Receiving steps only (standard deviations in parentheses) ........................................................................................................... 171 Table 6.12 Mean absolute filled pause percentage differences, Receiving steps only (standard deviations in parentheses) ........................................................................................................... 173 xi  Table 7.1 Heat map of dyads’ similarity over time on the 27 measures examined in this study. ●: dyads became more similar (≥ 5% change from starting value) over time. ●: dyads became slightly more similar (1-4.99% change) over time. ●: dyads remained stable over time (0-0.99% change) on the measure. ●: dyads became slightly less similar (1-4.99% change) over time. ●: dyads became less similar (≥ 5% change) over time. Perc: perceived similarity. Acous: global acoustic similarity. AE: amplitude envelope analysis. MFCC: mel-frequency cepstral coefficient analysis. Syll: syllable. Sp: speaking rate. Art: Articulation rate. .............................................. 186 Table A.1 Inventory of pieces in designs.................................................................................... 215 Table A.2 Inventory of extra pieces ............................................................................................ 217  xii  List of Figures Figure 2.1 First two steps of each design. A.: Easy condition. B.: Medium condition. C.: Hard condition. Full designs are given in Appendix A. ........................................................................ 37 Figure 2.2 Room setup for construction task ................................................................................ 45 Figure 2.3 Mean speaking time in seconds in Giving and Receiving steps in each condition. .... 54 Figure 3.1 Mean similarity ratings by condition and conversation third. Error bars indicate ± 1 standard error of the mean. ........................................................................................................... 67 Figure 3.2 Mean similarity rating by dyad in each conversation third. Error bars indicate +/-1 standard error of the mean ............................................................................................................ 69 Figure 4.1 Mean amplitude envelope similarity values by condition and conversation third. ..... 80 Figure 4.2 Mean amplitude envelope similarity by dyad. ............................................................ 81 Figure 4.3 Mean MFCC similarity by condition and conversation third. ..................................... 83 Figure 4.4 Mean MFCC similarity by dyad. ................................................................................. 84 Figure 5.1 Boxplot of words per second speaking rates by Condition and Type ....................... 100 Figure 5.2 Boxplot of words per second articulation rates by Condition and Type ................... 101 Figure 5.3 Boxplot of syllables per second speaking rates by Condition and Type ................... 102 Figure 5.4 Boxplot of syllables per second articulation rates by Condition and Type ............... 103 Figure 5.5 Absolute Early and Late words per second speaking rate differences in each dyad by Condition. Error bars indicate +/- 1 standard error of the mean. Higher values indicate larger absolute differences between talkers’ words per second speaking rates. ................................... 105 Figure 5.6 Absolute Early and Late words per second articulation rate differences in each dyad by Condition. Error bars indicate +/- 1 standard error of the mean. Higher values indicate larger absolute differences between talkers’ words per second speaking rates. ................................... 107 Figure 5.7 Absolute Early and Late syllables per second speaking rate differences in each dyad by Condition. Error bars indicate +/- 1 standard error. Higher values indicate larger differences between dyad partners’ syllables per second speaking rates. ..................................................... 109 Figure 5.8 Absolute Early and Late syllables per second articulation rate differences in each dyad by Condition. Error bars indicate +/- 1 standard error. Higher values indicate larger differences between dyad partners’ syllables per second speaking rates ...................................................... 111 xiii  Figure 5.9 Absolute Early and Late words per second speaking rate differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a larger absolute difference between dyad partners’ words per second speaking rates in the Giving steps. ............................................................................................................. 113 Figure 5.10 Absolute Early and Late words per second articulation rate differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a larger absolute difference between dyad partners’ words per second speaking rates in the Giving steps .............................................................................................................. 115 Figure 5.11 Absolute Early and Late syllables per second speaking rate differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate larger absolute differences between the dyad partners’ syllables per second speaking rates in the Giving steps. .............................................................................................. 117 Figure 5.12 Absolute Early and Late syllables per second articulation rate differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate larger absolute differences between the dyad partners’ syllables per second articulation rates in the Giving steps............................................................................... 118 Figure 5.13 Absolute Early and Late words per second speaking rate differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate larger absolute differences between the dyad partners’ words per second speaking rates in the Receiving steps. ........................................................................................................ 121 Figure 5.14 Absolute Early and Late words per second articulation rate differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate larger absolute differences between the dyad partners’ words per second articulation rates in the Receiving steps...................................................................................... 123 Figure 5.15 Absolute Early and Late syllables per second speaking rate differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate larger absolute differences between the dyad partners’ syllables per second speaking rate values in the Receiving steps. ............................................................................... 125 Figure 5.16 Absolute Early and Late syllables per second articulation rate differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. xiv  Higher values indicate larger absolute differences between the dyad partners’ syllables per second articulation rates in the Receiving steps. ........................................................................ 127 Figure 6.1 Boxplot of silent pause per second values by Condition and Type ........................... 150 Figure 6.2 Boxplot of silent pause percentages by Condition and Type .................................... 151 Figure 6.3 Boxplot of filled pause per second values by Condition and Type ........................... 152 Figure 6.4 Boxplot of filled pause percentage by Condition and Type ...................................... 154 Figure 6.5 Absolute Early and Late silent pause per second differences in each dyad by Condition.  Error bars indicate +/- 1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ silent pause per second rates. .......................... 156 Figure 6.6 Absolute Early and Late silent pause percentage differences in each dyad by Condition. Error bars indicate +/- 1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ silent pause values. .......................................... 158 Figure 6.7 Absolute Early and Late filled pause per second differences in each dyad by Condition. Error bars indicate +/- 1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ filled pause per second values. ........................ 159 Figure 6.8 Absolute Early and Late filled pause percentage differences in each dyad by Condition. Error bars indicate +/- 1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ filled pause percentage values. ........................ 161 Figure 6.9 Absolute Early and Late silent pause per second differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ silent pause rates. ................ 163 Figure 6.10 Absolute Early and Late silent pause percentage differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ silent pause percentage values...................................................................................................................................................... 164 Figure 6.11 Absolute Early and Late filled pause per second differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ filled pause per second values...................................................................................................................................................... 166 xv  Figure 6.12 Absolute early and late filled pause percentage differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ filled pause percentage values...................................................................................................................................................... 167 Figure 6.13 Absolute Early and Late silent pause per second differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ silent pause per second values..................................................................................................................................................... 169 Figure 6.14 Absolute Early and Late silent pause percentage differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ silent pause percentage values...................................................................................................................................................... 170 Figure 6.15 Absolute early and late filled pause per second differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ filled pause rates. ................ 172 Figure 6.16 Absolute Early and Late filled pause percentage differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ filled pause percentages. ..... 173   xvi  Acknowledgements  Thanks to my supervisor, Molly Babel. We started at UBC at the same time, and we’ve been on quite a journey together over the last six years. I’m a phonetician because of you, Molly; thank you for your expertise, your encouragement, and your boundless enthusiasm.  Thanks to my committee members, Carla Hudson Kam and Eric Vatikiotis-Bateson. Great questions, great points, great advice; this dissertation benefited tremendously from your guidance.  Financial support for this dissertation was provided in part by the Social Sciences and Humanities Research Council of Canada, by the University of British Columbia and the UBC Faculty of Arts, and by the late Charlie Abel.  Thanks to UBC Linguistics: the faculty, staff, and graduate and undergraduate students. I’ve worked with so many of you in various capacities over the years, and you’ve all taught me something. Particular thanks to Joe Stemberger for the use of his lab space, and to Edna Dharmaratne and Shaine Meghji, without whom the whole enterprise would cease to work.  Thanks to all the members of the Speech in Context Lab. You’re doing great work up there in our little attic: keep it up! Special thanks to all my LEGO task piloters: Kaitlin Sanders, Jamie Russell, Phoebe Wong, Sophie Walters, David Haist, Anahita Rustom, Maria Lenart, and Joe D’Aquisto.  Thanks to UBC’s Research Commons, which I’ve been tremendously proud to be a part of during the second half of my tenure here at UBC.  Thanks to all my friends, both those I knew before and those I’ve met during the last six years, who’ve been along for the ride.  Thanks to my siblings, Robert and Alison. xvii   And, as always and forever, thanks to my parents, Elizabeth and Douglas.  1  Chapter 1: Introduction 1.1 The question at hand  As talkers, we often change how we speak based on who we are talking to or listening to, or based on the circumstances we are speaking in. For example, talking to someone who speaks more quickly than we do may cause us to increase our own speech rate, even if we do not intend to. We may return from a vacation abroad to friends asking us why we sound like someone from the place we have just visited. When we find ourselves trying to talk while doing a challenging or attention-diverting task, we may notice that our speech becomes higher-pitched and louder. So, if we are in a situation where we must talk and listen to someone while doing something difficult, will we change how we talk based on our interlocutor’s speech, on how difficult our task is, on both, or on neither?  This dissertation explores the interaction of two factors which often cause talkers’ speech behaviour to change, whether consciously or not: convergence, the tendency to become more similar to the people we talk and listen to, and task difficulty, the fact that talking while experiencing a cognitive workload can affect how we produce and perceive speech. In particular, it examines whether talkers working in a dyad on a difficult, dialogue-intensive task together display more or less speech similarity over the course of their conversation than do talkers with an easier task. Convergence is explored in several areas: how listeners rate the vocal similarity of dyads in a harder task over time versus their ratings of talkers’ similarity in easier tasks; how global acoustic analysis techniques measure the dyads’ changing speech similarity in the different difficulty conditions; whether talkers’ similarity in speech rates over time is affected by the difficulty of the task they are working on; and whether talkers’ similarity over time in pausing behaviour changes due to the condition they are in. 2   This chapter proceeds as follows: Section 1.2 explores the current state of speech convergence research. Section 1.3 examines the research on the effect of task difficulty on speech behaviour. Section 1.4 presents the kinds of questions regarding speech convergence and task difficulty which may be answered by this study. Section 1.5 describes the study used to explore the effect of task difficulty on speech convergence. Section 1.6 presents an outline of the dissertation.  1.2 Convergence behaviour in speech  Speech convergence is a widespread phenomenon characterized by talkers becoming more similar to someone they are talking to, or even to someone they are only listening to. A range of speech and linguistic behaviours are susceptible to convergence, which has led to its study by researchers in a variety of disciplines. At the discourse level, talkers have often been found to converge in how quickly they speak (Street Jr., 1982; Putman & Street Jr., 1984; Levitan & Hirschberg, 2011; Pardo et al., 2013a). Other discourse features on which talkers converge are length of utterances and conversational turns (Putman & Street Jr., 1984; Bilous & Krauss, 1988); frequency and duration of pauses (Natale, 1975b; Bilous & Krauss, 1988); and how much talkers speak and laugh, use backchannels (verbal indications of understanding) and interrupt each other (Bilous & Krauss, 1988). Syntactically and lexically, talkers are often found to adopt the vocabulary, clause structure, and sentence structure of their interlocutor (see e.g., Levelt & Kelter, 1982; Garrod & Anderson, 1987; Garrod & Doherty, 1994; Branigan et al., 2000; cf. Healey et al., 2014), and talkers working to complete a task together often come to use the same referring term to indicate an item (see e.g., Clark & Wilkes-Gibbs, 1986; Brennan & 3  Clark, 19961). In the acoustic-phonetic domain, talkers converge on various quantifiable measures, including vocal intensity (i.e., loudness; Natale, 1975a); word or phrase duration (Abel et al., 2011); fundamental frequency (Babel & Bulatov, 2012; see also Gregory, 1990; Gregory et al., 1993; Gregory & Webster, 1996); local (Babel, 2010, 2012) and global (Kim, 2012; Lewandowski, 2012) spectral characteristics; and voice onset time in voiced and voiceless stops (Abrego-Collier et al., 2011; Nielsen, 2011; Shockley et al., 2004).   Speech convergence has been found to occur in both conversational and non-conversational contexts. Conversational contexts are usually elicited either through interview-style interactions (e.g., Natale, 1975a, b; Gregory, 1990; Gregory et al., 1993; Gregory & Webster, 1996), or task-based interactions, where two talkers must collaborate to accomplish some goal (e.g., Kim et al., 2011; Lewandowski, 2012; Pardo, 2006; Pardo et al., 2010, 2013a). The task-based interactions have typically used one of two tasks: a map task (Anderson et al, 1991), where one talker must give instructions to the other as to how to trace a route on a map, or a ‘diapix’ task (Van Engen et al., 2010), where each talker in a dyad has a slightly different picture and they must collaborate to identify differences between the pictures. Talkers in these tasks cannot see each other, so all of their interaction is accomplished through speech. Non-conversational contexts for studying convergence are typified by the auditory naming task (Goldinger, 1998; Namy et al., 2002; Nielsen, 2011; Babel, 2012; Babel & Bulatov, 2012; Kim, 2012), in which talkers produce single words from a pre-set list before and after (either                                                  1 Note that because of the way these studies were structured – i.e., one talker in the task was always directing the other – the two talkers did not necessarily both produce the same words or phrases to refer to the items. In most cases, the director was the one producing the bulk of the speech. Nevertheless, the talkers had to come to an (implicit) agreement on what terms referred to which items, and if the director used a term that their partner had not agreed to, confusion often resulted. 4  immediately or after some period of time) hearing a recording of a model talker producing the same words.  Convergence is frequently assessed through perceptual judgment tasks, in which listeners judge whether a talker’s productions sound more similar to those of the model they were listening to or the interlocutor they were speaking with after being exposed to the model/interlocutor’s productions than they did before exposure (see e.g., Babel & Bulatov, 2012; Goldinger, 1998; Kim, 2012; Kim et al., 2011; Namy et al., 2002; Pardo, 2006; Pardo et al., 2010, 2013a; Shockley et al., 2004). As Pardo (2013, p. 2) states, “[p]erceptual assessment of phonetic convergence provides a measure that reflects global similarity across multidimensional aspects of acoustic-phonetic attributes simultaneously”. Some recent studies, such as Babel and Bulatov (2012) and Pardo et al. (2010, 2013a), have broadened their explorations of convergence by using both perceptual measures and acoustic-phonetic measures to assess talkers’ changes in similarity over time. This presents a fuller picture of whatever convergence may be occurring, as changes which may be measurable acoustically may not be important – or even available – to perceivers (Pardo, 2013; Pardo et al., 2013b), but may nevertheless be both reliable and acoustically interesting. Researchers such as Kim (2012) and Lewandowski (2012) have begun to use global acoustic similarity measures in place of specific acoustic measures, which allow for a more holistic acoustic analysis than does ‘picking and choosing’ particular acoustic characteristics to examine, and is thus more comparable to a perceptual judgment task. Nevertheless, the perceptual judgment task is still the ‘gold standard’ for assessment of speech convergence, as human listeners perceive and interpret speech as speech, something which is still largely beyond the reach of similarity algorithms.  5  1.2.1 Proposed causes of speech convergence  Speech convergence has been attributed variously to three causes: a general link between perception and behaviour leading to a tendency towards mimicry; a particularly close coupling between speech production and speech perception; and a desire to accommodate one’s communicative behaviour to that of others. Each one posits a somewhat different mechanism of speech convergence, each of which could interact differently with the causes of cognitive workload due to task difficulty. These causes will be explored in turn in the sections below.  1.2.1.1 A general link between perception and behaviour  There is some evidence to suggest that humans are born with a tendency to mimic others2. Within a month of birth, for example, infants have been found to match the facial expressions of an experimenter (Meltzoff & Moore, 1977; cf. Anisfield, 1996, and Jones, 2009). Speech also exhibits this early tendency towards mimicry: between 12 and 20 weeks of age, babies listening to adult vowel productions have been found to produce vocalizations which both resemble the vowels they hear (Kuhl & Meltzoff, 1996) and which mimic the prosodic contours and utterance durations of the vowel productions (Kuhl & Meltzoff, 1982). Similar, apparently unconscious matching of another’s behaviour between adults has been found in a wide range of areas, including speech (as detailed above), posture and movement (e.g., Chartrand & Bargh,                                                  2 ‘Mimicry’ and ‘imitation’ are not necessarily the best terms for the speech phenomenon under investigation, as their lay uses often imply purposeful activity. Some researchers (e.g., Chartrand & van Baaren, 2009) make a distinction between the two, with mimicry being unconscious and imitation being conscious; others, however (e.g., Babel, 2012), will refer to certain speech phenomena as ‘spontaneous phonetic imitation’, with an understanding that this is unconscious. As mimicry and imitation are both widely used terms in the literature, they may occasionally come up in this dissertation; as much as possible, however, ‘convergence’ will be used instead. 6  1999; Richardson et al., 2007), and emotional affect (e.g., Zajonc et al., 1982; Bargh et al., 1996).  Spontaneous mimicry has been attributed by John A. Bargh and colleagues to a link between perception and behaviour. The core of this approach is that “the influence of perception on behavioral tendencies is automatic, in that it is passive, unintentional, and nonconscious” (Bargh et al., 1996, p. 233); that is, “[p]erceptual inputs are translated automatically into behavioral outputs” (Dijksterhuis & Bargh, 2001, p. 1). This link is thought to result from a mental overlap between the perceptual and behavioural representations for a particular action, which causes the motor representation of the action to be activated when it is observed in another. Dijksterhuis and Bargh (2001) suggest that from an evolutionary standpoint, this is a very old link, and is thus one that can be moderated by newer cognitive systems.  There are a number of possible advantages to such a perception-behaviour link. Socially, mimicking and being mimicked has been found to lead to liking and being liked, to be a way to demonstrate social affiliation, and to create and enhance empathy (e.g., Chartrand & van Baaren, 2009; see also the discussion in section 1.2.1.3 below). Developmentally, mimicry has long been thought to be one of the key tools children use to acquire skills (see e.g., Piaget, 1951/1999), including the sounds, words, and prosodic features of their language (Kuhl & Meltzoff, 1996). Meltzoff and Moore (1994, p. 83) suggest that for a child, “[i]mitation is to understanding people as physical manipulation is to understanding things”. It is possible to inhibit this activation if there are disincentives to mimicry (e.g., putting oneself in danger or discomfort), or if current operating goals require behaviour that is in conflict with that suggested by the perceptual input (Dijksterhuis & Bargh, 2001). We do not imitate every action we observe, but perception is nonetheless sufficient to create action – and often does. 7   1.2.1.2 A coupling between speech production and speech perception  Dialogue is not an easy task. Talkers wishing to interact with one or more interlocutors not only have to formulate a message – including semantic, syntactic, and phonetic content – and produce that message, but also have to interpret incoming messages, formulate and produce further messages in response, and so on. Automating these tasks, at least to some degree, would make dialogue easier. One way to do so may be to couple speech comprehension and speech production through mechanisms akin to the production-behaviour link discussed above. The series of models proposed by Martin J. Pickering and Simon Garrod (Pickering & Garrod, 2004a, b; Garrod & Pickering, 2009; Pickering & Garrod, 2013) have been at the forefront in this area. Pickering and Garrod acknowledge (e.g., 2004a, p. 177) that there is a great deal of similarity between the automaticity in their models and the automaticity of the perception-behaviour link proposed by Bargh and colleagues. One key difference seems to be that the automatic speech production-perception link is in support of an intentional joint action: i.e., in dialogue, “interlocutors have the goal of communicating and realize that their partners also have this goal” (Garrod & Pickering, 2009, p. 295). Other non-speech automatically imitated behaviours seem to be unintentional or incidental: for example, the people who converged in their chair-rocking patterns in Richardson et al. (2007) had no intention of imitating the other’s movements, but it happened nonetheless. Pickering and Garrod (2004a, b) developed the ‘interactive alignment’ model, which proposed that dialogue is a joint action between talkers which is greatly facilitated by a tight coupling between language production and language comprehension. In this model, coupling is achieved by having channels of ‘alignment’ leading to talkers automatically having the same 8  representation on multiple linguistic levels. Alignment does not happen all at once; rather, the alignment of lower linguistic levels – articulatory, lexical, syntactic – leads to alignment at higher levels: in particular, to alignment of talkers’ situation models, which are their “multi-dimensional representation[s] of the situation under discussion” (Pickering & Garrod, 2004a, p. 172). Alignment is accomplished through priming: “encountering an utterance that activates a particular representation makes it more likely that the person will subsequently produce an utterance that uses that representation” (Pickering & Garrod, 2004a, p. 173). As in Bargh and colleagues’ proposals, it is possible for alignment to be inhibited when it conflicts with a talker’s current goals (Pickering & Garrod, 2004b; Garrod & Pickering, 2004): for example, high-level goals such as correcting an interlocutor’s misunderstanding may override a talker’s low-level alignment to that interlocutor’s previous productions (Pickering & Garrod, 2004b), likely because they are paying more attention to the high-level meaning of their interlocutor’s speech than to the low-level phonetic qualities (Garrod & Pickering, 2004). However, it is expected that such inhibition will be more difficult for a talker than simply aligning. Garrod and Pickering (2009) introduce the idea that the reason for alignment is to allow a talker to predict their interlocutor’s actions; this is further developed in Pickering and Garrod (2013), where speech production and comprehension is taken to be a particular case of action and action perception. In this approach, “people compute action representations during perception and perception representations during action to aid prediction of what they are about to perceive or do, in a way that allows them to ‘get ahead of the game’” (Pickering & Garrod, 2013, p. 332, italics original). This is accomplished within a forward action model, in which an action command generates a copy of itself, called an ‘efference copy’, to be used in predicting the perception of the outcome of the command. The predicted percept is compared to the percept 9  generated by the action, and the result is fed back into the system to determine if changes need to be made to successfully carry out the next action (see Hickok et al., 2011, for a similar model focusing on speech production). Crucially, a listener’s forward action model can generate the same kind of prediction of a talker’s action, which can then be used to predict the talker’s actions and assist the listener in determining how to respond. For Pickering and Garrod (2013, p. 337), imitation is an integral part of dialogue: they propose “that listeners predict speakers’ upcoming utterances by covertly imitating what they have uttered so far, deriving their underlying message, generating efference copies, and comparing those copies with the actual utterances when they occur”.     1.2.1.3 An accommodation of one’s communicative behaviour to that of others  Communication Accommodation Theory (CAT) was introduced by social psychologist Howard Giles and colleagues in the early 1970s, and has continued to evolve to this day. Unlike the approaches discussed above, CAT takes speech convergence to be a non-automatic behaviour used by a talker to decrease the social distance between him/herself and an interlocutor (see e.g., Giles, 1973; Giles et al., 1973; Giles & Coupland, 1991; Giles et al., 1991; Giles & Ogay, 2007). Importantly, convergence is not the only option under the accommodation viewpoint: a talker could instead choose to diverge from an interlocutor in order to increase the social distance between them (e.g., Bourhis & Giles, 1977), or to keep his/her speech behaviour the same. Convergence can be either full or partial: i.e., a talker can choose to exactly3 match an                                                  3 While this is possible in some areas of speech – for example, in speech rate, where a talker with an average rate of two words per second can conceivably speed up to exactly match an interlocutor’s rate of three words per second – it is more difficult, and perhaps even impossible, in other areas, such as precisely matching an interlocutor’s vowel formants. 10  interlocutor’s speech characteristics, or to move closer to the interlocutor’s speech characteristics without an exact match (Giles et al., 1991). As well, both convergence and divergence can be classified as ‘upward’ or ‘downward’, depending on the social relations between the interlocutors: for example, a lower-status talker can upwardly converge to an interlocutor with higher social status by adopting more prestige forms, while a higher-status talker using even more high-prestige forms with an interlocutor of lower social status would exemplify upward divergence (Giles et al., 1991; Giles & Ogay, 2007).  From the discussion above, it can be seen that the social function of convergence is the main focus of CAT. When the choice is made to converge, it is usually taken to be driven by a desire to gain the interlocutor’s approval (Giles et al., 1991; Giles & Ogay, 2007). Experimental results support this proposal: for example, talkers who converge to an interviewer’s speech characteristics are rated more positively by those interviewers (e.g., Putman & Street Jr., 1984) and by listeners who hear the interviews (Street Jr., 1982). Convergence may also improve communication effectiveness (Giles et al., 1991; Giles & Ogay, 2007): the increasing similarity between interlocutors’ speech patterns can lead both to increased intelligibility (e.g., Triandis, 1960; Giles & Coupland, 1991) and to increased cognitive organization, by allowing interlocutors “to organize events into meaningful social categories, thereby allowing the complex social situation to be reduced to manageable proportions” (Thakerar et al., 1982, p. 239).   1.2.2 Factors affecting convergence Speech convergence appears to be a fairly robust and widespread phenomenon, but is nonetheless susceptible to modulation by a range of linguistic, individual personality, social, and task-related factors. Linguistic factors affecting whether talkers become more similar to a model 11  talker or an interlocutor include whether the words being used in an auditory naming task are high- or low-frequency (Goldinger, 1998), and how different the interlocutors’ dialects of English are (e.g., partners who speak the same dialect vs. partners who speak different dialects or have different native languages; Kim et al. 2011). Individual personality factors affecting convergence include the emotional reactivity of a talker (Black, 2012), the tendency of a talker to engage in socially desirable behaviours (Natale, 1975a, b), and the level of phonetic talent a talker displays – that is, how capable a talker is of making the sounds of languages other than their native language (Lewandowski, 2012). Social factors impacting speech similarity include attitudes induced towards the model talker (Abrego-Collier et al., 2011; Babel, 2010, 2012) and social status (Gregory et al., 1993; see also Gregory & Webster, 1996). Task-related factors influencing talkers’ similarity over time include whether a partner in a dyadic interaction task is giving or receiving instructions (Pardo, 2006; Pardo et al. 2010, 2013a), and whether a participant in an auditory naming task is only listening to the model talker or is doing another task at the same time (Abel et al., 2011). Some factors affecting convergence seem to fall into more than one of these categories. For example, while the gender of a talker is an individual factor, as is the gender of a model talker or interlocutor, the interaction between the genders is at least partially a social construct. The gender of both the talker and the model talker/interlocutor they are listening to or interacting with has been found to affect whether talkers will become more vocally similar to the model/interlocutor (see Bilous & Krauss, 1988; Namy et al., 2002; Black, 2012). Task difficulty, by its very nature, is strongly related to the nature of the task a talker is engaging in, and also related to individual personality characteristics; however, it has not yet been fully explored in relation to speech convergence. Task difficulty is linked to cognitive 12  workload, which is “the information processing load placed on [a] human operator while performing a particular task” (Lively et al., 1993, p. 2962): the more difficult a task, the greater the cognitive workload. Difficult tasks can also lead to psychological stress, which is different from (but often confounded with) cognitive workload; this dissertation focuses on cognitive workload rather than stress. Task difficulty has been shown to have a number of effects on both speech production and speech perception, although it has generally been assessed in individual speech behaviour rather than in speech in dialogue. To a limited extent, Abel et al. (2011) found an effect of dividing attention – which often increases cognitive load – on whether and how much talkers converged to a model talker’s word duration; however, this study focused more on how attention was being diverted (e.g., through picture drawing versus doing math equations) rather than on how difficult the additional task was (e.g., for many participants, a math task would be harder than a picture-drawing task). As well, a talker’s role in a dyadic interaction task – i.e., whether they are giving instructions to a partner or receiving instructions from their partner – seems to affect whether and to what extent they become more similar to their partner (Pardo, 2006; Pardo et al. 2010, 2013a); this may have some relation to giving instructions being more difficult than receiving them (see e.g., Bortfeld et al., 2001; Pardo et al., 2013a), but that relation between difficulty and convergence has not been fully explored. Thus, whether task difficulty has an effect on convergence largely remains an open question, and one whose answer could potentially enhance our understanding of convergence. In the next section, the effects of task difficulty on speech production and perception are examined, before turning to the potential interaction of task difficulty and speech convergence.  13  1.3 Task difficulty and speech behaviour  The increased cognitive workload resulting from engaging in more difficult tasks has been found to affect both speech production and speech perception on a variety of levels. Just as there are multiple hypothesized causes of speech convergence, there are numerous proposed reasons for these effects, ranging from the physiological to the psychological to the social.  1.3.1 Speech production under cognitive workload  Several studies have examined how increased cognitive workload affects various acoustic measures in speech production, often for the purposes of detecting and alleviating adverse levels of workload in high-pressure work environments (e.g., aviation, emergency response management; see Brenner et al., 1994; Griffin & Williams, 1987; Khawaja, 2010; Huttunen et al., 2011). Subjects in these studies are typically engaged in some type of dual-task situation, where they are asked to do two things at the same time, one of which involves speech. For example, subjects may be asked to say words or phrases while performing a task which increases in difficulty, such as a manual (Brenner et al., 1994) or visual (Lively et al., 1993) tracking task, a problem-solving task (Tolkmitt & Scherer, 1986), or an aircraft simulator task (Griffin & Williams, 1987; Huttunen et al., 2011). They could also be asked to silently monitor numbers presented over headphones while reading text passages aloud (Khawaja, 2010). Other types of difficult speech tasks, which are not necessarily used in dual-task scenarios, include tongue twisters and reciting the alphabet backwards (Mendoza & Carballo, 1998).  Increased workload has been fairly consistently associated with speech rate changes – in particular, with decreased duration of words (Griffin & Williams, 1987), syllables (often equated with speaking rate: Brenner et al., 1994; Scherer et al., 2002) and utterances (Lively et al., 1993). 14  The spectral characteristics of speech also seem to be reasonably susceptible to workload-induced changes. Increased amplitude or intensity of speech is quite consistently reported in higher-workload conditions (Griffin & Williams, 1987; Lively et al., 1993; Brenner et al., 1994;  Huttunen et al., 2011). Some studies have found a consistently higher mean fundamental frequency for speakers in higher cognitive workload conditions (Griffin & Williams, 1987; Mendoza & Carballo, 1998; Scherer et al., 2002; Huttunen et al., 2011), but others have not (Hecker et al., 1968; Tolkmitt & Scherer, 1986; Lively et al., 1993). Other spectral characteristics have also been found to be inconsistently affected by workload: for example, Mendoza and Carballo (1998) found decreases in vocal jitter (variability in cycle duration between cycles) and shimmer (variability in speech amplitude between cycles), but Brenner et al. (1994) did not. Lively et al. (1993) note that among their small subject sample, not all subjects displayed the same changes, while Tolkmitt and Scherer (1986) found differences among their subjects based on their gender and their style of coping with anxiety. These results suggest that individuals may adapt idiosyncratically to higher workloads; indeed, Hecker et al. (1968) noted that while task-induced vocal changes varied between talkers, they were quite consistent within talkers. Nevertheless, some effects are more common than others, e.g., speech rate increases. Khawaja (2010) and Khawaja et al. (2008, 2012) have examined the effects of cognitive workload on speech production in collaborative tasks, in addition to speech production by individuals. The collaborative environments were laboratory simulations of a bushfire management task, where levels of cognitive workload were set through a tabletop task, and actual bushfire management training scenarios, where levels of cognitive workload changed randomly from low (little urgency, processes running smoothly) to escalating to high (urgency and high resource coordination demands) through the course of the scenario (Khawaja et al., 15  2012, p. 522). In the real-life scenario, at what times the bushfire management professionals experienced each level of workload was indicated post hoc in transcripts of their conversations. Focusing primarily on lexical and grammatical characteristics of speech, it was found that in both situations, talkers in high-workload circumstances used longer sentences compared to those in low-workload circumstances; fewer positive emotion words and more negative emotion words; more words describing mental states (e.g., ‘believe’, ‘think’) and perceptions (e.g., ‘see’, ‘hear’); more words expressing disagreement and fewer expressing agreement; and more plural pronouns than singular ones. In addition, individual speakers were examined for differences in silent and filled (e.g., ‘um’ and ‘uh’) pausing behaviour in a dual-task condition, and were found to pause more in higher workload conditions than in lower workload ones.  1.3.2 Speech perception under cognitive workload Turning now to speech perception in situations of cognitive workload, research in this area has looked at the effects of workload on speech not as a symptom of individuals’ reactions to task difficulty, but as a factor that can alter or impair individuals’ interpretation of incoming speech as a result of increased demands on attention and working memory resources (Mattys et al., 2012). Using various dual-task conditions (e.g., adding a visual search task, a reaction-time task, or an arithmetic task), additional workload has broadly been found to affect which cues listeners can and do use in processing the speech stream. In the studies conducted by Sven Mattys and colleagues, cognitive load appears to generally impair the perception of acoustic cues, meaning listeners rely more on lexical information and probabilities when segmenting the speech signal (Mattys et al., 2009), discriminating phonemes (Mattys et al., 2014), and classifying ambiguous stop consonants (Mattys & Wiget, 2011), even when that information is 16  contrary to the acoustic information in the signal. In the Mattys et al. (2005) hierarchical approach to speech segmentation, lexical-semantic information is more heavily weighted than acoustic information; thus, in their studies, it is the strongest cue that is relied on in conditions of cognitive load. However, when lexical information is not available, workload counterintuitively appears to cause listeners to rely more on weaker acoustic cues than on strong ones: for example, Gordon et al. (1993) found that listeners in a dual-task condition relied more on f0 onset frequency than voice onset time when identifying stop consonants, and more on vowel length than on formant patterns when identifying vowels. Workload also appears to change listeners’ perceptions of vowel length, which impairs their word identification abilities when full lexical information is not available (Casini et al., 2009). As well, studies on artificial languages have found that increased cognitive workload reduces listeners’ ability to use transitional probabilities to segment the speech stream (Toro et al., 2005; Fernandes et al., 2010), although they can sometimes compensate by using coarticulatory cues instead (Fernandes et al., 2010).  1.3.3 Proposed causes of workload-based effects on speech behaviour The reasons posited for workload-induced changes in speech behaviour are varied. In terms of non-lexical speech production changes, one school of thought is that the vocal changes are related to physiological changes – e.g., increased heart rate, increased respiration, increased tension in the vocal muscles – resulting from psychological stress due to increased cognitive workload (Tolkmitt & Scherer, 1986; Brenner et al., 1994; Mendoza & Carballo, 1998; Huttunen et al., 2011). Lively et al. (1993), on the other hand, suggest that talkers engage in what Lindblom (1990) describes as ‘hyperspeech’, in which adjustments are made to the speech system which maximize speech intelligibility and discriminability, in order to better function in 17  high cognitive workload conditions. Moving to higher-level speech concerns, Khawaja (2010) and Khawaja et al. (2008) propose that the increased pause rates in the individual dual-task condition allow talkers to “regulate the pace of the information flow such that [they are] able to manage their cognitive load” (Khawaja, 2010, p. 74). Khawaja (2010) and Khawaja et al. (2012) suggest that the lexical and grammatical changes observed in the collaborative tasks resulted from talkers actively trying to manage and share the high-cognitive-load tasks they are engaged in among all team members: for example, the increased use of plural pronouns “support the notion that people actually collaborate and coordinate tasks more with each other during highly complex real-world tasks” (Khawaja et al., 2012, p. 526). Thus, speech production under workload could change due to a variety of factors. In speech perception, the changes in cue use in dual-task settings have primarily been attributed to changes in the availability of attentional and working memory resources due to the increased workload (Gordon et al., 1993; Casini et al., 2009; Mattys & Wiget, 2011; Mattys et al., 2009, 2012, 2014). The suggestion by Mattys and colleagues (Mattys et al., 2009, 2014; Mattys & Wiget, 2011) is that in high-load conditions, listeners are not able to use acoustic cues to segment speech or identify sounds, so must rely on higher-order information – in particular, lexical-semantic information – to interpret the incoming signal. What causes the inability to use the acoustic cues in a dual-task scenario has not been precisely identified. Mattys et al. (2014) suggest that it could be caused by the length of time a listener’s attention is diverted from the speech signal by the other task; thus, if a listener can return their attention to the speech signal fairly quickly, they may be able to use acoustic cues. Alternately, they suggest that if attention is a regulator of the signal-to-noise ratio, as proposed by Gordon et al. (1993), then diversion of attention could reduce the depth or complexity of acoustic processing by giving more weight to 18  weak cues. Finally, following Casini et al.’s (2009) findings that a high-load condition affects listeners’ perception of vowel length, Mattys et al. (2014) suggest the diversion of attention could affect the perceptual sampling rate of the speech signal by disrupting the intake of the signal, and that this effect on the perceived timing of sounds will affect listeners’ identifications of those sounds.  1.4 Does task difficulty affect speech convergence? As discussed above, whether speech convergence is affected by cognitive workload due to task difficulty is an open question. There are several possible ways these factors could interact to affect speech behaviour, each of which would have different explanations depending on one’s theoretical approach to convergence and task difficulty, and each of which may have different implications for the theoretical approaches. In addition, exploring how convergence is affected by task difficulty could have a methodological impact on how future task-based convergence studies are designed.   1.4.1 Potential interactions of speech convergence and task difficulty, and possible theoretical explanations  As seen in sections 1.2 and 1.3, there are several models which attempt to explain why the changes caused by speech convergence and cognitive workload occur. Given that convergence and workload have been found to affect some of the same areas of speech – for example, talkers’ speech rate and spectral characteristics – examining the interaction of convergence- and workload-induced effects could further help to enhance our understanding of the causes of these effects. As well, by indicating which models of convergence and workload 19  are compatible with each other, such an exploration may help us refine our models of speech as a whole. In addition, such a study may shed light on which models of convergence are more or less useful for predicting how talkers’ behaviour will change when conditions are not as controlled as in the laboratory – for example, when talkers must deal with higher workload than is found in the typical auditory naming task scenario. It is also likely that these two factors will not interact in isolation: for example, talkers’ cognitive processing styles and/or personal traits  – e.g., openness to new experiences, conscientiousness (John et al., 1991, 2008; Benet-Martinez & John, 1998) – will quite likely affect how they deal with workload and whether they become more similar to an interlocutor. With these possibilities in mind, four scenarios are presented below: task difficulty leading to reduced convergence, task difficulty leading to increased convergence, task difficulty having mixed effects on convergence, and task difficulty not having an effect on convergence.  1.4.1.1 Task difficulty leads to reduced speech convergence  It is perhaps easier initially to imagine the situation in which more difficult tasks lead to talkers becoming less similar to each other over time, whether globally or on specific measures of similarity. In terms of the more automatic approaches to convergence – the perception-behaviour link and the speech production-perception coupling – this could be evidence of additional cognitive workload reallocating attentional resources, as seen in the effects of workload on speech perception (Casini et al., 2009; Mattys & Wiget, 2011; Mattys et al., 2009, 2014). This could lead, for example, to global damping of the perception-behaviour link, in the approach favoured by Bargh and colleagues, if “passive effects of perception are dominated by currently operating goals” (Dijksterhuis & Bargh, 2001, 29). In this approach, such a damping  can occur because the perception-behaviour link is an older cognitive system than the systems 20  being recruited by the current operating goals; the newer systems can override the older system when the need arises. Alternatively, if workload leads to only certain elements of the speech stream being processed, as in Mattys and colleagues’ proposals, then only those elements of the stream which are perceived will be available to be imitated; this could lead to, for example, talkers not becoming more acoustically or phonetically  similar over time, but still showing convergence in other areas. Similarly, in the early Pickering and Garrod models, difficulty-induced resource reallocation could lead to a non-alignment of the representational channels due to current cognitive needs. If, for example, talkers reduce the amount of attention they are paying to their interlocutor’s speech in order to focus on a difficult task, alignment will not occur, as it requires talkers to be paying attention to the speech stream (Pickering & Garrod, 2004a, b; Garrod & Pickering, 2004). In the newer approach (Pickering & Garrod, 2013), increased workload could lead to a failure to generate the appropriate forward models to allow for prediction of an interlocutor’s speech actions. Alternatively, it may be the case that alignment or imitation is still occurring at some level, but that the effects of attentional resource allocation have changed the level at which it is occurring. For example, if acoustic information is less available when processing the speech stream due to workload, and lexical knowledge then becomes the primary processing mechanism (Mattys & Wiget, 2011; Mattys et al., 2009, 2014), we might expect to find reduced acoustic-phonetic alignment in high workload conditions.   In terms of Giles’ Communication Accommodation Theory (CAT), additional workload could lead to stress or anxiety (see e.g., Gudykunst, 1995; Giles & Ogay, 2007), leading to a lack of social synchrony between talkers and a failure to converge. Alternatively, if the stress or workload is seen (rightly or not) as emanating from the other talker, a talker could choose to 21  maintain individual speech patterns or to diverge from the other talker to express their social displeasure.   As well, in all approaches, if talkers’ different physical and/or emotional responses to the stress of increased workload result in different changes in speech behaviour, as suggested by the diverse findings on fundamental frequency changes discussed in section 1.3.1, evidence of convergence may be suppressed; for example, if a talker is converging towards to a partner’s lower f0 but her own f0 is increasing due to task difficulty, she may appear to be diverging.  1.4.1.2 Task difficulty leads to greater speech convergence More difficult tasks leading to talkers becoming more similar to each other, whether globally or in specific domains of speech, is perhaps a more counter-intuitive outcome than difficulty leading to less convergence. Nevertheless, it is an outcome which could be explained within the existing models of convergence and workload.  From the convergence viewpoint, such an explanation might be easiest in a Communication Accommodation approach, where increased workload in a collaborative situation could – and perhaps should – lead to talkers becoming more similar, either through a desire to communicate more effectively (Thakerar et al., 1982; Giles & Coupland, 1991; Giles & Ogay, 2007) and/or, as suggested by the findings of Khawaja and colleagues (Khawaja, 2010; Khawaja et al., 2012), to share the increased task load among the interlocutors, which could be better accomplished if the social distance between them is decreased. In this case, talkers would use increasing similarity as a strategy for overcoming the difficulty of the task. From a Pickering and Garrod viewpoint, increased workload could be a motivation to increase alignment or the accuracy of forward modeling, which would then increase the effectiveness of the joint action. In 22  a perception-behaviour link approach, increased workload could perhaps suppress the factors which might inhibit mimicry. From a workload viewpoint, if talkers do tend to display the same kinds of speech behavioural changes due to increased task difficulty, whether for physiological or psycho-social reasons – e.g., increasing speech rate (Griffin & Williams, 1987; Lively et al., 1993; Brenner et al., 1994; Scherer et al., 2002) or fundamental frequency (Griffin & Williams, 1987; Mendoza & Carballo, 1998; Scherer et al., 2002; Huttunen et al., 2011) – then what appears to be ‘convergence’ may fall out naturally as a result of these changes. The same type of effect may be found if talkers engage in hyperspeech (Lindblom, 1990) in high-workload conditions, as per Lively et al.’s (1993) suggestion: if they use similar techniques to maximize intelligibility, their similarity should naturally increase over time. In a case like this, it may be difficult to tell which changes are a result of convergence and which ones are a result of increased workload. However, if talkers display different behavioural changes due to workload, as suggested by Hecker et al. (1968) and Lively et al. (1993), and the talkers become more similar over time despite their different reactions to workload, then the changes over time would likely be due to convergence rather than workload.  1.4.1.3 Task difficulty shows mixed effects on speech convergence  As suggested in section 1.4.1.1, it may be the case that increased cognitive workload will cause talkers to converge more on certain elements of speech and less on others. Thus, for example, if workload affects speech perception by reducing listeners’ ability to process acoustic information, leading them to rely on lexical-semantic information to interpret the speech stream 23  (Mattys et al., 2009, 2014; Mattys & Wiget, 2011), then we would expect to see increased similarity over time in the lexical area but reduced similarity in acoustic-phonetic areas.  1.4.1.4 Task difficulty shows no effect on speech convergence  A final possibility is that task difficulty will not be found to have any reliable or consistent effect on whether talkers become more similar to each other over the course of their interaction. This could be the result if speech convergence is a sufficiently robust behaviour that it is not disrupted by increased cognitive workload: for example, if the perception-behaviour link or the production-perception coupling is not affected by attentional resource allocation, or if the desire to reduce social distance is sufficiently strong that convergence will occur even under adverse conditions. This could also be the case if other factors have a greater impact on talkers’ convergence behaviour than does task difficulty. As discussed in section 1.2.2, there are multiple individual personality, social, task-related, and linguistic factors that have been found to affect convergence, and there are likely others that have not yet been identified. Any of these factors could very well have a more significant effect on whether talkers become more similar over time than cognitive workload does.  1.4.2 Methodological implications of task difficulty for studying speech convergence  By many lay standards, most of the convergence studies discussed in section 1.2 would be categorized as involving ‘simple’ tasks, even when being undertaken in an unfamiliar laboratory setting: e.g., saying words, listening to someone else say them, and saying them again. In the case of Abel et al. (2011), while the additional tasks in which subjects engaged could be somewhat challenging (e.g., solving math equations), the essential convergence-eliciting task 24  was a basic auditory naming task. However, in the case of the dyadic interaction studies, the tasks which participants are engaging in while conversing – describing and replicating a route on a map, identifying differences between two pictures – are not necessarily ‘easy’ ones.   Methodologically, the growing trend towards using more natural speech materials to analyze speech convergence specifically and speech behaviours more generally (Kim et al., 2011; Lewandowski, 2012; Pardo, 2006; Pardo et al., 2010, 2013a; Van Engen et al., 2010; inter alia) is one which will ideally lead to more ecological validity in this research than decontextualized work alone. However, it is also an approach which can introduce more sources of variation – and thus potential confounds – than decontextualized work. The greater the understanding of what these sources are, the more they can be taken into account when designing studies and interpreting findings.   For example, the dyadic interaction tasks used in previous conversational acoustic-phonetic convergence studies – the map task used by Pardo (2006) and Pardo et al. (2010, 2013a), and the diapix task used by Kim et al. (2011) and Lewandowski (2012) – have not reported taking into account the potential difficulty of the tasks for the talkers completing them. In the map task, developed by Anderson et al. (1991), each partner has a specific role: the ‘Giver’ has a route drawn on her version of the paper map and must communicate that route to her partner; the ‘Receiver’ does not have the route on her map and must work with the Giver to successfully fill it in. However, it is not clear that both roles are equally difficult. In particular, the Giver may have the more difficult task: Wright and Hull (1990) suggest verbal instruction-giving is a skill that not all talkers have, nor do talkers necessarily have an understanding of what would help their listeners follow their instructions. There is also no guarantee that the Giver will get feedback to help them improve the quality of their instructions; the accuracy of the 25  transmitted route is assessed by the experimenters after the task has been completed, and the Receiver is more likely to engage in a process of negotiation with the Giver to establish what should be done (see e.g., Brennan & Clark, 1996, for examples from lexical entrainment research) than to suggest ways the Giver can improve her instructions. It is also not clear that map reading/interpreting and direction giving/receiving are of equal difficulty for all potential map-task participants: Lobben (2004) observes that neither psychologists nor cartographers have determined why some people can read maps better than others. As well, Montello et al. (1999) demonstrate that although the folk wisdom that men are better at spatial tasks – including map reading – is false, there are gender differences in terms of ability in various spatial and navigational tasks, which could affect how difficult male and female participants find the task. Van Engen et al.’s (2010) diapix task was designed to induce more balanced dialogue than in the map task, in terms of both the types of utterances produced by the talkers (the map task tends to be imperative-heavy) and how much each talker contributes to the conversation (the conversation in the map task tends to be dominated by the Giver). Thus, the diapix task is set up in a more symmetrical way than the map task: each talker has a paper version of a picture representing a particular scene (e.g., a street scene, a beach scene), with 10 differences between the pictures (omissions, colour/sign changes, etc.) which were designed to be identifiable by all participants. The talkers cannot see each other’s pictures, and must work together to identify the 10 differences. In compiling the Wildcat Corpus of native- and foreign-accented English from which Kim et al. (2011) worked, the experimenters gave the pairs 20 minutes to find all 10 differences; after 20 minutes, the experimenters ended the session because participants were usually getting frustrated and the conversation was breaking down. It thus does not appear to have been the case that all participants and all dyads found the task equally easy; however, no 26  indication is given of which dyads, out of the 38 in the study, did not complete the task before reaching the frustration point. A further complication in this corpus is the fact that various combinations of L1 and L2 speakers of English made up the dyads. It may be the case that non-native speakers of English would find the task more difficult than native speakers. In work on lexical entrainment between native and non-native speakers of English, using a dyadic card-matching task followed by an individual picture-labelling task, Bortfeld and Brennan (1997) observed that the non-native speakers found the matching task very difficult, which may have affected their performance on the labelling task. Why this may be is not clear; however, not knowing how inherently difficult a task is makes it hard to determine how that difficulty may affect a subject’s performance, or measures related to their performance, on the task. The study reported in this dissertation used a task which is, at its most basic level, easy enough for a young child to do, but which can be relatively simply scaled up to become quite difficult.  1.5 The current study: Testing the effect of task difficulty on speech convergence   The study presented in this dissertation tested the effect of task difficulty on speech convergence. To accomplish this, a new dyadic interaction task was developed which uses LEGO® construction toys: each talker in a dyad receives an identical set of LEGO pieces and half of a set of instructions to build a particular construction. The talkers alternate giving instructions until the task is completed – i.e., until they believe they have both built the same construction. This task is similar in some ways to the map task used by Pardo (2006) and Pardo 27  et al. (2010, 2013a)4, and to the diapix task used by Kim et al. (2011) and Lewandowski (2012), in that the talkers in the dyads cannot see what their partner is doing, and in that each talker should contribute to the discussion in order to successfully complete the task. However, in this task, each talker is responsible for giving half of the instructions, meaning that the distribution of workload is more equal than in the map task, and potentially more equal than in the diapix task, where one talker could conceivably contribute less than the other.  Three levels of task difficulty were created, based on the number of pieces in each step of the construction task: the Easy condition had two pieces per step (18 steps total), the Medium condition had three pieces per step (12 steps total), and the Hard condition had four pieces per step (10 steps total). The resulting corpus of 15 dyadic interactions – five at each level of difficulty – was analyzed in four ways, any of which could be expected to show evidence of convergence given the previous literature. As indicated in sections 1.2 and 1.3, there are many possible dimensions along which talkers’ speech changes due to convergence and workload could be assessed; the four chosen for this dissertation represent only a small number of those dimensions, but they provide both global assessments of changes in talkers’ speech over time and more specific examinations of behaviour which may show changes due to either convergence or task difficulty.                                                  4 The map task bears similarities to another type of LEGO-based task used by Clark & Krych (2004) and Krych-Applebaum et al (2007), in which a director instructed a matcher as to how to build a number of DUPLO® constructions. In those studies, manipulations included whether the director and matcher could see each other’s faces, and whether the director could see what the matcher was building in response to their instructions; this is different both from the current study and from the map and diapix tasks, in which the partners are never able to see each other or each other’s maps/diagrams/constructions. 28   The first two analyses were ones which are established ways of assessing convergence, as discussed in section 1.2, but for which there was no clear prediction of behaviour from the cognitive workload literature: (1) Perceived similarity over time. As discussed in section 1.2, listeners’ judgments of whether talkers become more similar in their speech production over time is the ‘gold standard’ of convergence assessment. Thus, it was key to this study to include this analytical technique to get an overall assessment of whether task difficulty affects convergence. A set of listeners who had not participated in corpus collection were asked to rate the similarity of the voices of the talkers in each dyad in a perceptual judgment task. Nine utterances were used from each dyad, taken from the early, middle and late stages of their conversation. (2) Acoustic similarity over time. The utterances used in the perceptual judgment task were also submitted to two acoustic similarity analyses, one using amplitude envelope measures (following Lewandowski, 2012) and one using mel-frequency cepstral coefficient (MFCC) measures (following Kim, 2012). These two analytic techniques provided an overall indication of whether talkers became more spectrally similar over time without using a large number of individual acoustic measures. The goals of these global analyses were twofold: a) Would the dyads show evidence of convergence over the course of their interaction – i.e., an increase over time in perceivers’ similarity ratings and/or in the acoustic similarity values as measured by amplitude envelopes or MFCCs? b) If increased similarity was found over time, would it be affected by the difficulty of the condition a dyad was in? 29   The third and fourth analyses looked at phenomena which have previously been examined in terms of both convergence and cognitive workload, and which have been found to be affected by both of these factors: (3) Speech rate similarity over time. Talkers’ speech rates were examined to determine whether dyad partners became more similar over the course of their interaction in terms of how quickly they spoke. Both words per second and syllables per second rates were analyzed. (4) Pausing behaviour over time. Talkers’ pausing patterns – pause rate and pauses as a percentage of speaking time – were examined to determine whether dyad partners became more similar over the course of their interaction in how frequently and for how long they paused. Both silent pauses and filled pauses (utterances such as ‘uh’ and ‘um’) were analyzed. In these analyses, three questions were at issue: a) Would the dyads in the more difficult conditions show a higher speaking rate and/or a higher pause rate or pause percentage than those in the Easy condition? b) Would the dyads become more similar in speech rate and/or on pause rate/percentage, as shown by a decrease in the absolute difference in their speech rates and/or pause rates/percentages over the course of the conversation? In the case of speech rate, following the measures in Pardo et al. (2010, 2013a), would the dyads show a correlation in their speech rates over time? c) If the dyads became more similar in speech rate and/or pause rate/percentage, would that be affected by the difficulty of their task? 30  In all of these areas, increased similarity over the course of a dyad’s interaction could suggest that the talkers were engaging in some type of convergence or alignment behaviour.  1.6 Outline of the dissertation The remainder of this dissertation proceeds as follows.  Chapter 2 explains the methodology used in the collection of the construction task corpus, including the construction task itself, the personality and cognitive measures collected before the construction task, the participants involved, and the preparation of the audio files for analysis. As well, information about the conversations in the corpus is provided, including time to completion and error rates for each condition and each dyad, and differences in speaking time between steps in which talkers were giving instructions and those in which they were receiving instructions. Finally, samples from one conversation are presented in transcript form, to give an idea of the nature of the conversations in the corpus. Chapter 3 presents the perceptual judgment task used to measure listeners’ assessments of vocal similarity between the talkers in each dyad in the construction task, and the results of the task.  Chapter 4 details the two global acoustic similarity analyses used to measure whether the talkers in each dyad were becoming more spectrally similar over time, and the results of those analyses.   Chapter 5 presents the analysis of speech rate convergence, which looked at both words per second and syllables per second. Differences in speech rate between the conditions are examined, as are the changes in the absolute difference between dyad members’ speech rates 31  over time. As well, following Pardo et al.’s (2010, 2013a) lead, dyad partners’ speech rates are submitted to a time-series cross-correlation analysis. Chapter 6 details the analysis of pause rate and pause frequency convergence, which examined both silent pauses and filled pauses. Differences in silent pauses per second, silent pause percentage, filled pauses per second, and filled pause percentage between the conditions are examined, as are the changes in the absolute difference between dyad members’ pausing values over time. Chapter 7 includes a general discussion of the results of the study, including its limitations, and presents directions for further research.  32  Chapter 2: Corpus collection 2.1 Introduction The material used for the examination of the effect of task difficulty on speech convergence described in this thesis is a corpus of conversations between dyads working on tasks of different levels of difficulty. The types of conversations desired were ones in which participants would need to collaborate to achieve a goal, similar to the ones engendered by the HCRC map task created by Anderson et al. (1991), which has been used extensively in Pardo and colleagues’ work (Pardo, 2006; Pardo et al., 2010, 2013a, 2013b) on acoustic-phonetic convergence, and the diapix (finding differences between two pictures) task used in creating the Wildcat corpus (Van Engen et al., 2010), which was used by Kim et al. (2011) in their examination of convergence between talkers from different dialect backgrounds, and by Lewandowski (2012) in her study of the effect of phonetic talent on convergence between talkers of different native languages. In addition, the task which participants would undertake would need to be one in which the quantification and measurement of difficulty would be relatively straightforward: e.g., through the number of steps to be completed or goals to be achieved in the task. As well, the task should not require a great deal of prior knowledge, skills, or practice on the part of the participants in order to be successfully completed. Finally, a task which the participants would find engaging and interesting was highly desirable; as Anderson et al. (1991, p. 356) suggest, “[i]ntensive involvement with the task in hand distracts speakers’ attention away from their language”.  LEGO® construction toys were used as the basis of the task. In terms of the desired criteria to be met in a dyadic interaction task in which difficulty is controlled for, LEGO toys appear to fit the bill quite well. Owens et al. (2008, p. 1945) note that “LEGO is a highly 33  structured, predictable and systematic toy”, and LEGO playsets are inherently designed to become more difficult as a child grows older; sets for younger children have fewer pieces and relatively simple instructions, while sets for older children have many more pieces and more complex instructions. The variable number of pieces involved and the typical layout of the instructions – in which one can determine relatively easily how many pieces need to be added in each step – means that difficulty is quantifiable in at least a basic way. For example, a task involving 12 LEGO pieces – all of different types and colours, with six total steps in which two bricks are added in each step – is likely to be easier than a task involving 30 LEGO pieces, not all of which are identifiable by one feature only (e.g., ‘the blue one’), with six total steps in which five bricks are added in each step. There are approximately 7000 unique LEGO pieces (Lauwaert 2008), meaning that a great deal of flexibility is available in designing tasks for participants. An additional benefit is that a LEGO-based task is likely to be engaging to the participants; work on the use of LEGO bricks and systems in educational settings suggests that even adult learners find working with LEGO toys to be enjoyable (Freeman, 2003; Klassner and Anderson, 2003; Cliburn, 2006). While LEGO systems have been widely used in educational contexts, particularly in mathematics, technology, and computer science (see e.g., the work of Seymour Papert and Mitchel Resnick in the MIT Media Lab, which led to the LEGO MINDSTORMS robotics systems: Papert, 1980; Resnick et al., 1988, 1996), and to some extent in therapeutic contexts for autism spectrum disorders (LeGoff, 2004; LeGoff & Sherman, 2006; Owens et al., 2008), their use as an experimental vehicle has not been extensive. Examples of the use of LEGO in experiments are Fawcett and Perkins (1981), who used a free-play house-building task to look at language development in school-age children, and Clark and Krych (2004) and Krych-Applebaum et al. (2007), who used LEGO (technically, DUPLO®) blocks in a 34  director-builder matching task somewhat similar to the map task used by Pardo (2006), and Pardo et al. (2010, 2013a). The corpus was intended not only to address the questions asked in this dissertation – i.e., whether task difficulty has an effect on perceived convergence, global acoustic similarity measures, and speech and pause rate convergence – but also to be useful in future research investigating speech convergence in various ways. For this reason, data was collected which has not been used in this dissertation – i.e., the personality and cognitive measures which the participants completed prior to completing the construction task, and the video data of the participants in the construction task. The results of the personality and cognitive measures are given in Appendix B. The remainder of the chapter will proceed as follows: Section 2.2 describes the methods used in collecting the construction task corpus, including the process of designing the construction task (2.2.1), the participants (2.2.2), the personality and cognitive measures which were collected and the procedure for collecting them (2.2.3), the construction task procedure (2.2.4), and the preparation of the audio files, which are the basis of the analyses in Chapters 3-6 (2.2.5). Section 2.3 provides results, including statistics on time required to complete the task and on error rates (2.3.1) and speaking time (2.3.2), as well as samples of the conversational interaction of one dyad in transcript form (2.3.3). Section 2.4 is a summary.  2.2 Methods 2.2.1 Construction task design In designing the construction task, a number of criteria were deemed to be important. 35  (1) Each participant should have the opportunity to both give and receive instructions. In Pardo (2006) and Pardo et al. (2010, 2013a), a talker’s role as giver or receiver affected whether and how much they converged to their partner; the current study was thus designed to place each talker in each role. (2) Each participant should have an equal number of turns. (3) Each step should have an equal number of pieces. (4) The number of pieces should be such that the number of steps and pieces per step could be varied while still using the same design and allowing each participant to have the same number of turns. For example, an 18-piece design would allow for six steps with three pieces each, with each participant being the instruction-giver for three steps; however, if only two pieces were used in each step of that construction, there would be nine steps, meaning that each participant would not have an equal number of opportunities to give instructions. (5) There should be enough steps and/or pieces involved to allow for the collection of a reasonable amount of conversation; based on the tasks used in Kim et al. (2011), Pardo (2006), and Pardo et al. (2010, 2013a), 30-40 minutes of conversation seemed to be desirable. (6) The design and piece selection should create opportunities for the repetition of lexical items – e.g., colours, shapes, descriptors, etc. – which could be compared against each other in later analyses. (7) Factors should be incorporated which would promote various kinds of conversation; e.g., question-and-answer sequences, descriptive sequences, disambiguation sequences, repair sequences, etc. 36  Prior to beginning data collection from the construction task, factors which could affect the difficulty of the task were explored by having pairs of volunteers collaborate to build several preliminary designs. The volunteers were demographically similar to the participants in the final task (female native speakers of English between the ages of 18 and 25). Designs were tested which had different numbers of steps, different numbers of pieces per step, and extra pieces or no extra pieces. The structure of the task was similar to that in the final construction task (see section 2.2.4): Each participant had half of the steps necessary to complete the construction; the participants alternated between giving and receiving instructions; the participants could look back at previous instructions, but could not look forward through the instructions; and there was a barrier between the participants which prevented them from seeing each other. Differences between this pilot task and the final task were that the experimenter was in the room while the participants were completing the task; multiple structures were built in one session; and the participant who gave the final instruction checked their partner’s construction for errors and then instructed her on how to fix those errors. The final designs for and structure of the construction task were informed by observations of these sessions and feedback collected from the session participants. All LEGO designs in the construction task were created using the LEGO Digital Designer 4.3 program (available from http://ldd.lego.com/en-us/, retrieved November 4, 2012). Basic information about each design is given in Table 2.1. The full designs are provided in Appendix A; the first two steps of each design are shown in Figure 2.1.   37  Table 2.1 Design information for the LEGO construction task Design Number Number of Pieces Number of Steps Number of Pieces per Step Number of Instruction-Giving Turns per Partner Anticipated Difficulty 1 36 18 2 9 Lowest 2 36 12 3 6 Medium 3 40 10 4 5 Highest        A. B. C. Figure 2.1 First two steps of each design. A.: Easy condition. B.: Medium condition. C.: Hard condition. Full designs are given in Appendix A. 38   The same pieces were used in Designs 1 and 2. Design 3 incorporated the same pieces as Designs 1 and 2, as well as four additional pieces; this allowed the instructions to incorporate four pieces per step while still having an equal number of instruction-giving turns per partner. Each participant received a bag including the 36 pieces required to complete all three designs, the four pieces required to complete Design 3, and 10 pieces which were not required to complete any of the designs, for a total of 50 LEGO pieces in each bag. Thus, for Designs 1 and 2, 14 of the pieces in the bag were ‘extra’ pieces; for Design 3, only 10 of the pieces were extra. A full inventory of the pieces included in each bag – both required pieces and extra pieces – is given in Appendix A.4.   ‘Extra’ pieces: The ‘extra’ pieces were selected to create the possibility for situations in which participants would need to distinguish a required piece from an extra piece; it was hoped that this would encourage more discussion between the participants, and might also create the possibility for participants to make and fix errors. It was observed in piloting and reported by the piloting participants that the inclusion of extra pieces did not necessarily make the task more difficult. Including extra pieces, however, did mean that participants could not simply ‘coast’ through the last step of the task; they had to continue giving their partner enough detail to be able to pick out the correct pieces, and so had to approach the last step in the same way as they approached the other steps. (See Clark & Wilkes-Gibbs [1986] for an example of a task where the last step was essentially a ‘no-brainer’.)  Each extra piece matched at least one required piece in terms of colour and/or shape.  Layout: It was observed in the pilot sessions that building vertically – i.e., stacking pieces on top of each other – was easier for the participants than building horizontally – e.g., 39  placing pieces next to each other and then connecting them, or attaching pieces to the sides of other pieces (compare, e.g., Steps 11 and 12 of Design 1 in Appendix A.1). For this reason, the designs included both vertical and horizontal building.5  Colour choices: As mentioned above, one of the desired design criteria was having repeated lexical items to compare from various times in the interaction. Colour terms would be quite amenable to repetition, as they would not only be used when first referring to a piece, but also when talking about where to place a piece in relation to other pieces, distinguishing a piece from other pieces of the same shape, etc. A larger number of pieces of two colours – blue and yellow – were used to allow for the collection of multiple tokens of these lexical items at various points in the conversation; if the colours were distributed evenly – e.g., three or four pieces each of 10 or 11 different colours – it might not be possible to get multiple tokens from both participants in the conversation at multiple points in time. Blue was chosen as one of these colours (comprising 8 of 36 pieces in Designs 1 and 2, 9 of 40 pieces in Design 3, and two of the extra pieces) because of the change in progress in certain varieties of North American English – including those spoken in Canada, and in the Pacific Northwest in particular – involving fronting of the GOOSE (Wells, 1982) vowel (see e.g., Labov et al., 2006; Boberg, 2008). Blue is also one of the colours with multiple shading options available in the LEGO piece inventory, which allows for even more possibilities of distinguishing pieces (e.g., light blue vs. medium blue vs. dark blue). Yellow was chosen as the other primary colour (comprising 7 pieces in all designs, plus 1 extra piece) because of the potential interesting changes that could be observed in two-syllable words vs. one-syllable words, such as vowel reduction in the second syllable. Of the                                                  5 Should this research be expanded in the future, we hope to be able to incorporate the as-yet unpublished work of Ron Rensink and his collaborators at UBC, which investigates the interpretability of picture-based sequential assembly instructions involving LEGO toys (Ron Rensink, personal communication). 40  pieces used, orange and purple occurred only once, and brown was required only in the Hard condition.  ‘Weird’ pieces: Including pieces that were not ‘standard’ LEGO blocks – e.g., not 2x2 bricks, or 2x4 bricks – meant that the instruction-giver would have to find ways to describe them so that the instruction-receiver could identify them. These pieces were often described by the participants as being “weird”. Pieces that were harder to describe were included in the later stages of the designs; it was felt to be necessary to create a physical base for participants to work on before adding more complex pieces. This also meant that they had several steps on which to work together and potentially establish some kinds of reference before getting to those pieces. More potentially ‘weird’ pieces were incorporated in Design 3 than in Designs 1 and 2, although those ‘weird’ pieces were extra pieces for Designs 1 and 2.   Changing perspective in instruction pictures: Because a particular number of pieces were being added in each step – two, three, or four – it was necessary to ensure that participants could see all of the pieces that were being added in each step in the design. Thus, when necessary, the design was rotated (using LEGO Digital Designer) to an angle at which all of the pieces being added could be at least partly seen, while keeping occlusion to a minimum. (Compare, e.g., steps 9 and 10 of Design 1 in Appendix A.1.) The final instructions were printed in colour, single-sided, on white 8.5" x 11" paper in landscape orientation. Each step was on its own sheet. The number of the step was printed in black ink in the top left-hand corner. Each sheet was enclosed in a plastic sheet protector; the protectors for one set of instructions were held together with a metal ring, meaning participants had to flip the page to get to their next step. Each set of instructions had a cover page; the cover page for the odd-numbered steps read “Instruction Set 1 (please do not flip this page until you 41  and your partner are ready to begin)”, and the cover page for the even-numbered steps read “Instruction Set 2 (please do not flip this page until your partner has completed giving the first instructions)”.  2.2.2 Participants Participants were recruited via visits to UBC Linguistics classes, as well as through the UBC Psychology Graduate Student Council’s Paid Participant Studies List (http://gsc.psych.ubc.ca/studies/paid_studies.html) and through members of the UBC Speech in Context lab. Participants had to be female; gender has been found to have an effect on convergence and accommodation behaviour (Bilous & Krauss, 1988; Black, 2012; Namy et al., 2002; Pardo et al., 2010), so this corpus was restricted to a single gender. To minimize dialectal variation to some degree, participants had to be self-reported native speakers of Canadian English between the ages of 18 and 25. The age requirement also helped to ensure participants would have similar familiarity with LEGO toys (see e.g., Chambers & Carbonaro, 2003). As well, participants could not have any self-reported speech, language, hearing, or colour vision disorders. The colour vision requirement was included to ensure that participants would not have difficulty identifying the colours of the LEGO pieces. Thirty-two participants initially took part in the first session, which tested personality and cognitive measures. Two participants were unable to come in again to do the second session, and were compensated for their participation in the first session. Thirty participants – 15 dyads, five in each of the three conditions – initially took part in the second session. The audio for one of these dyads was lost due to a computer malfunction; two additional participants were then recruited who did both the first and second sessions. The average age of the 30 participants who 42  were included in the final analysis was 20 years 135 days (SD 1 year 288 days). Participants were compensated $10/hour for their participation.  2.2.3 Personality and cognitive measures collection In the first of the two experiment sessions, participants completed two personality measures and two cognitive measures. In this session, participants were assigned a code number by which they were identified throughout the rest of the study; these code numbers will sometimes be referred to in this dissertation. The first personality measure was the Autism Spectrum Quotient (AQ) (Baron-Cohen et al., 2001). This is a 50-item self-reporting questionnaire which measures five aspects of cognitive processing and communicative ability, including attention switching and attention to detail. Yu (2010) found that women with a lower AQ compensate for the effects of phonetic coarticulation less than men (who generally have a higher AQ than women) and women with a higher AQ, which could ultimately be a cause of sound change. Yu et al. (2013) found that participants’ attention switching subscore on the AQ was a predictor of whether they would converge with or diverge from a model talker. The AQ questionnaire was administered and scored through E-Prime 2.0 (Schneider et al., 2007).  The second personality measure was the Big Five Inventory (BFI) (John et al., 1991, 2008; Benet-Martinez & John, 1998). This is a 44-item self-reporting questionnaire in which participants rate themselves on a scale of 1 (Disagree strongly) to 5 (Agree strongly) on items indicating tendencies towards five traits: Extraversion (8 items), Agreeableness (9 items), Conscientiousness (9 items), Neuroticism (8 items), and Openness to experience (10 items). Yu et al. (2013) found that participants’ openness scores were predictors of whether they would 43  converge with or diverge from a model talker. The measure was administered through E-Prime 2.0. Only the participant’s responses were recorded by E-Prime; conversion of the reverse-scoring items  – for example, a participant who answers a 1 (Disagree Strongly) to the statement ‘I see myself as someone who tends to find fault with others’ would actually receive a 5 for agreeableness – and calculation of the final scores was done by hand.  The first cognitive measure was the Automated Reading Span (RSPAN) measure (Turner & Engle, 1989; Engle, 2002; Unsworth et al., 2005). This task measures working memory capacity, which has been found to correlate with a variety of cognitive processing factors (see e.g., Engle, 2002), as well as to mediate perception of context-induced speech variation (Yu et al., 2011). The measure was administered through E-Prime 2.0, using a script from the Attention and Working Memory Lab at the Georgia Institute of Technology (http://psychology.gatech.edu/renglelab/Eprime2.html; retrieved February 1, 2013). Scoring was done automatically by E-Prime.  The second cognitive measure was a Mental Rotation measure (Shepard & Metzler, 1971; Moreau, 2013), which acts as a measure of participants’ spatial abilities and may correlate with their performance in the construction task. This test was administered online through the Java-based application developed by Krantz (2013), running through the Mozilla Firefox browser. The parameters used were essentially those of the “Original 3D experiment”, set to run three levels of rotation through 0, 90, and 180 degrees, with 10 test items per level, for a total of 30 test items. Scoring was done automatically by the application. Participants pressed the ‘Z’ key (marked with a piece of green tape) for objects which they believed were in the same orientation and the ‘/’ key (marked with a piece of red tape) for objects which they believed were in mirror orientation. After inspection of the pre-test results, it was found that one participant (126) only 44  pressed the ‘Z’ key during her initial pre-test session. Some of her reaction times (e.g., less than 100 ms) suggested that she was not actually looking at the pairs of items before pressing the key. She was asked to come in again and re-do that test; the explanation given was that the computer had not logged her responses properly. She was available to re-do the test immediately before her construction task session; it was explained to her that the computer had difficulty logging responses if they were entered too quickly, and it was more important to be accurate than fast. The responses from her second attempt at the test were included in the analysis. The total time required for administration of these measures was between 35-50 minutes. Participants were told that their performance on these measures would not affect whether or not they could participate in the second part of the experiment. All participants completed the measures individually. In the cases where two participants were doing the measures at the same time, they were at separate computer workstations in separate sound-attenuated cubicles in the testing room. The order of the measures was counterbalanced across participants. The personality questionnaires were interspersed with the cognitive measures; participants never completed two personality questionnaires in a row or two cognitive measures in a row. All the measures were run on a Lenovo Intel Core2Duo computer running Windows XP, with a Lenovo ThinkVision 22-inch colour monitor.  2.2.4 Construction task The dyads in the construction task consisted of two participants who had completed the cognitive and personality measures, who were available to participate in a one-hour session at the same time, and who either had not known each other prior to arriving for the construction task, or who had done the cognitive and personality measures in the same session and had not 45  known each other prior to that session. The time between the first and second sessions varied by participant; the average time between sessions was 9 days (SD 8.3 days). Dyads were assigned to difficulty conditions sequentially; the first dyad to do the task was placed in the Easy condition, the second in the Medium condition, the third in the Hard condition, and so on. Prior to beginning the task, participants filled out a consent form for use of their audio and video data. Participants were then seated on opposite sides of a large table, 47 inches by 47.5 inches, in wheeled office-style chairs with armrests. A barrier made of white foam board, 24 inches high by 30 inches wide, was placed in the centre of the table – 8.5 inches from either side – to prevent the participants from seeing each other and/or each other’s constructions (see Figure 2.2).            Figure 2.2 Room setup for construction task   46  Each participant wore an AKG C520 head-mounted condenser microphone. The microphone was on the right side of the participant’s face, approximately 2 cm from the right corner of the mouth. Both microphones were run through a Sound Devices USBPre 2; 44.1 kHz, 16-bit stereo audio recordings were made using Audacity 2.0.1 (downloaded from http://audacity.sourceforge.net, July 27, 2012).  Two Panasonic HC-V700M high definition video cameras were placed on tripods and positioned to the sides of the table. The mounting plate of the tripods was 36 inches above the floor, putting the centre of the camera lens approximately 37 inches above the floor. Participants were filmed from their left sides, so that the head-mounted microphones would not block the view of their mouth in the shots. The cameras captured an oblique view of all of the participant’s body visible above the table – usually from the top of the head to the lower chest/upper abdominal region, as well as arms and hands – and her construction. Shots were set up individually for each participant and were not changed during the task. The video recordings used iFrame format, recording at 30 frames per second progressive, with a frame size of 960 pixels by 540 pixels.  After testing the audio and video recording equipment, participants were randomly given either the first or second set of building instructions, a bag of LEGO pieces, and an instruction sheet for the experiment. The instruction sheet read as follows:  You and your partner will be building the same LEGO construction. Each of you has the same LEGO pieces, and each of you has half of the set of instructions necessary to build the construction. When it is your turn – as determined by the number of the step next to the picture – please explain to your partner what pieces she will need and how to put them together so that your constructions will match the ones shown in the pictures. If you get stuck, you may look back at previous steps; however, please do not look ahead at the upcoming steps.  47  Remember that your partner can’t see the instructions you have or the construction you are building, so be as clear as you can in giving your instructions. You are encouraged to ask questions if you’re not clear what your partner is telling you to do. You may talk as much as you like while you are doing the task. There is no time limit on how long you have to complete the task.  When you have finished building, please leave your finished constructions on the table.    The instructions were also explained orally by the experimenter, and the participants were able to ask any questions they might have. The participants were instructed to begin once the experimenter had left the room, and were requested to let the experimenter know when they had completed the task. Participants were not told how many LEGO pieces they had been given, how many were needed for the task, how many steps there were in the task, or how many pieces they had to use in each step.  Following completion of the task, the participants completed a questionnaire asking them about the task they worked on that day, how well they thought they worked with their partner, how often they give and follow instructions in daily life, and about activities that might have affected their performance (see Appendix D). Participants were seated at the same table (in a room adjacent to the experiment room) while completing the questionnaire, and were told that only the researchers would see their responses. While the participants were completing the questionnaire, the number of errors (if any) they had made in their constructions were counted and recorded by experimenter. Following Krych-Applebaum et al. (2007), errors were counted as the minimum number of pieces that would have to be changed for the construction to match the final instruction; if building on earlier errors created errors, the later errors were not counted, as they would not have occurred if the original error had not been made. 48   2.2.5 Preparation of participant recordings The audio files were transcribed by the author using English orthography into Praat (Boersma & Weenink, 2012) textgrid files. The stereo audio files including both talkers were used so that context could assist in accurate transcription, and so that the interaction between the talkers and any resulting lexical convergence (i.e., possible areas of interest) could be identified. Transcription was done in two passes: the first one consisted primarily of putting in boundaries around turns and transcribing shorter passages (e.g., single words, very short phrases), while the second one involved transcribing longer passages and adjusting turn boundaries when necessary. Boundaries were placed around what were identified as turns in the context of the particular dyad (Liddicoat, 2007; Sacks, Schegloff, & Jefferson, 1974); thus, what constituted a turn depended on the particular dyad. The main criterion for identifying a turn was identifying a segment of the conversation that accomplished what Sacks et al. (1974) call a ‘turn-job’ – i.e., stretches of speech which did some particular task, such as giving an instruction, acknowledging understanding, clarifying, or asking for clarification. Turns were also identified using intonation (which also depended on the dyad; some participants used high rising terminal intonation – ‘uptalk’ – more than others), interruptions, and breaths (which also depended on the pair – some participants used  breaths as a turn-holder to indicate that they were not finished speaking). Determining the transitions between steps was similarly based on the cues used by each dyad rather than on context-independent criteria. The main indicators were verbal cues – e.g., finishing cues such as “I’m done” or “that’s all”, and handover cues such as “okay, I’ll flip” or “next step” –  and the sounds of instruction pages flipping. In some dyads, not all steps were 49  handed over with explicit transitions; in these cases, estimates of step transitions were made, sometimes with the aid of audible page flipping noises. The transcription was done following the guidelines developed for use with the Forced Alignment and Vowel Extraction (FAVE) program suite developed by Rosenfelder et al. (2011). In addition to word transcription, breaths {BR}, noises {NS}, laughter {LG}, lip smacks {LS}, and coughs {CG} were indicated. Transcriptions such as “gonna”, “dunno”, “d’you”, and “lemme” were used where there did not seem to be separability into individual canonical words like “going to”, “don’t know”, “do you”, or “let me”.6 After the initial transcription was completed, the left and right tracks of each dyad’s stereo file – each representing one participant’s audio input – were separated using Audacity and saved as mono .wav files to facilitate uploading to the FAVE website. Each dyad member’s transcription tier from the dyad’s Praat textgrid file was extracted into a separate file, and transformed into a tab-delimited text file via ELAN (Wittenburg et al., 2006). The .wav and .txt files were then uploaded to the FAVE site, which automatically aligned the words and phonemes for each participant. An early inspection of the FAVE alignment outputs indicated that the word alignment was much more accurate than the phoneme alignment; for this reason, the phoneme alignment outputs were ignored. The word alignment outputs were checked and, when appropriate, corrected manually based on the audio signal and on waveform and spectrographic information available in Praat.                                                   6 E.g., if an inspection of the audio signal, waveform, and spectrogram indicated that there was no closure for [t] and no changes in the characteristics of the nasal sound in a talker’s pronunciation of “going to”, particularly if these were coupled with an absence of formant change in the vowel in “going”, then this phrase would be transcribed as ‘gonna’.  50  2.3 Results  In this section, the conversations which make up the corpus are examined in three ways. To confirm that there were differences in difficulty between the conditions, the mean time required for task completion in each condition and by each dyad, and the number of errors made in each condition and by each dyad, are given in section 2.3.1. In section 2.3.2, the mean speaking time in each step will be explored; in particular, an asymmetry in speaking time between steps in which talkers are giving instructions and those in which they are receiving instructions will be discussed, as it will have an impact on the analyses developed in the rest of this thesis. In section 2.3.3, transcript extracts are given from one dyad to illustrate a typical interaction between the talkers.  2.3.1 Corpus statistics: time and error rates The time required for each pair to complete the task was calculated from the beginning of speech after the experimenter had left the room to the time that one of the participants took off her head-mounted microphone to go and get the experimenter after the task was completed. This time usually included an opening period, when the participants were orienting themselves to the task, and a closing period, when the participants were confirming that they had completed the task, in addition to the steps required to complete the task itself. The opening and closing times typically comprised between 1-2% of the total completion time (minimum 0.54%, maximum 3.1%). Table 2.2 gives the completion time statistics for each condition. The Easy condition had both the shortest mean time to task completion and the shortest minimum time required to complete the task. These means and minimums increased in the Medium condition. The Hard 51  condition had the longest mean time to task completion, the longest maximum time to completion, and the longest minimum time to completion.  Table 2.2 Completion time statistics for each condition (standard deviations in parentheses)  Easy Medium Hard Average time to complete task 33.0 minutes (14.27 minutes) 35.13 minutes (6.46 minutes) 38.58 minutes (14.41 minutes) Maximum time to complete task 56.99 minutes  42.0 minutes 57.26 minutes Minimum time to complete task 21.90 minutes 24.64 minutes 25.27 minutes   As discussed in section 2.2.4, errors were counted as the minimum number of pieces that would have to be changed for the construction to match the final instruction (following Krych-Applebaum et al., 2007). If building on earlier errors created errors, the later errors were not counted, as they would not have occurred if the original error had not been made. Table 2.3 gives the error rates for each condition. The smallest number of errors, and the lowest number of participants and of dyads with at least one error, were all found in the Easy condition. The number of errors and number of participants with errors both increased in the Medium condition; the number of dyads with at least one error was the same as in the Easy condition. The Hard condition had the highest number of errors, and all of the participants and dyads in that condition made at least one error.   52  Table 2.3 Error rates for each condition  Easy Medium Hard Total number of pieces placed in error in the condition 7/360 13/360 28/400 Percentage of pieces placed in error in the condition 1.94% 3.61% 7% Number of participants with at least one error 4/10 8/10 10/10 Number of dyads with at least one error 4/5 4/5 5/5 Number of dyads in which both participants had at least one error 0/5 4/5 5/5   From Table 2.2 and Table 2.3, it can be seen that there were differences in completion time and error rate between the three conditions. Because of the small number of dyads in each condition, the significance of these differences was not calculated. The average time to completion was longer in the Hard condition than in the other two conditions, although the standard deviation was also quite large. The error rate in the Hard condition was almost twice that in the Medium condition, and more than three times that in the Easy condition. As well, it is notable that all of the participants in the Hard condition made at least one error7, while only four of the participants in the Easy condition did.  Table 2.4 provides completion time and error information for each dyad.                                                    7 In fact, all talkers in the Hard condition had one error in common: substitution of the brown 2x4 brick for the orange 2x4 brick (see step 3 of the Hard design in Appendix A.3). It is possible that that error may have been a result of the quality of the printed instruction. Had more extensive pilot testing been conducted using the final designs, rather than only the initial testing described in section 2.2.1 above, steps may have been taken to avoid this error (e.g., reprinting the step in question). 53  Table 2.4 Completion time and error information for each dyad.  Giver 1 indicates the first talker in the dyad to give instructions, and Giver 2 indicates the second talker in the dyad to give instructions. Dyad Giver 1  Giver 2 Condition Time to complete (minutes:seconds) Total Errors Giver 1 Errors Giver 2 Errors E1 102 101 easy 56:59 2 0 2 E2 109 108 easy 24:50 2 0 2 E3 111 112 easy 26:16 1 1 0 E4 125 124 easy 21:54 0 0 0 E5 131 130 easy 35:00 2 2 0 M1 105 103 medium 36:20 0 0 0 M2 113 114 medium 37:50 4 3 1 M3 115 126 medium 34:50 5 3 2 M4 121 119 medium 24:38 2 1 1 M5 127 132 medium 42:00 2 1 1 H1 104 110 hard 28:50 4 2 2 H2 106 107 hard 25:16 4 3 1 H3 117 116 hard 30:47 7 4 3 H4 128 118 hard 50:46 6 4 2 H5 134 129 hard 57:16 7 4 3   It is not clear why dyad E1 required almost 22 minutes more to complete the task than the next longest-to-completion dyad in the Easy condition. Unlike dyad H5, which had the longest completion time of any dyad, there was no single very long step (H5’s longest step took over 18 minutes) and no major construction breakages. They did not spend a great deal of time fixing errors, as dyad H4 did, and they did not make a large number of errors. Anecdotally, it was observed that both partners laughed quite a lot during the course of the task, and that they spent a fair amount of time trying to come up with descriptive terms for individual pieces and for the construction as a whole; this was similar to the behaviour observed in dyad M5, who had the longest completion time in the Medium condition (albeit still 15 minutes shorter than that of dyad E1).  54  2.3.2 Corpus statistics: gross speaking time  Gross speaking time was calculated for each step for each talker by removing all inter-turn pauses. Overall, the average speaking time per step was 80.1 seconds (SD = 76.2 seconds). However, a significant asymmetry was observed between the steps in which talkers were giving instructions (the Giving steps) and the steps in which they were receiving instructions (the Receiving steps): that is, the Receiving steps were by and large shorter than the Giving steps. Eighty-four of the 200 Receiving steps (42%) were shorter than the shortest Giving step, which was 22.9 seconds long. Twenty-five of the 200 Receiving steps (12.5%) were shorter than 10 seconds. The trends in mean speaking time in the Giving and Receiving steps in each condition are illustrated in Figure 2.3.            Figure 2.3 Mean speaking time in seconds in Giving and Receiving steps in each condition. Error bars indicate ± 1 standard error of the mean.  55   Log-transformed mean speaking time, based on the mean speaking time for each talker, was used as the dependent variable in a repeated-measures ANOVA, with Condition (Easy, Medium, Hard) and Type (Giving, Receiving) as factors. Log-transformed ratings were used as the distribution of  speaking times was skewed right; log transformation produced a more normal distribution. There was a main effect of Condition [F(2, 27) = 8.445, p = 0.00142] , and a main effect of Type [F(1, 27) = 226.736, p < 0.001].8 The mean by-talker speaking times were higher in the Giving steps than in the Receiving steps, and those in the more difficult conditions were higher than in the Easy condition.  Given the results of the ANOVA, the mean speaking times for Giving and Receiving steps are measured separately in the tables below.   Table 2.5 Speaking time in Giving steps by Condition (standard deviation in parentheses)  Easy Medium Hard Average time per step 82 seconds (56.3 seconds) 130.7 seconds (68.1 seconds) 166.4 seconds (99.7 seconds) Maximum step speaking time 340.1 seconds 396.2 seconds 615.2 seconds Minimum step speaking time 22.9 seconds 39.9 seconds 45.2 seconds   Table 2.6 Speaking time in Receiving steps by Condition (standard deviation in parentheses)  Easy Medium Hard Average time per step 31.1 seconds (31 seconds) 42.2 seconds (30.8 seconds) 63.2 seconds (76.9 seconds) Maximum step speaking time 150.8 seconds 164.5 seconds 493.4 seconds Minimum step speaking time 3.1 seconds 6.5 seconds 9.4 seconds                                                    8 The same effects were found when untransformed mean speaking time values were used rather than log-transformed values.   56   A one-way ANOVA using only the mean by-talker speaking time from the Giving steps, with Condition (Easy, Medium, Hard) as the factor, showed an effect of Condition [F(2, 27) = 9.678, p < 0.001]. Bonferroni-corrected (α = 0.0167) paired t-tests indicated that the mean by-talker Giving speaking time was higher in the Medium condition than in the Easy condition [t(9) = -3.259, p = 0.00986] , and higher in the Hard condition than in the Easy condition [t(9) = -3.6116, p = 0.0056]. The difference between the mean Giving by-talker speaking times in the Medium and Hard conditions was not significant [t(9) = -1.4692, p = 0.1759].   As the mean by-talker speaking times for the Receiving tests were not normally distributed, a series of Wilcoxon rank sum tests were used to explore the differences in Receiving speaking times between conditions. The differences between the Easy and Medium Receiving speaking times (W = 27, p = 0.089) and between the Medium and Hard Receiving speaking times (W = 38, p = 0.393) were not significant at a Bonferroni-corrected α-level of 0.0167; however, there was a trend towards a difference between the Easy and Hard Receiving speaking times (W = 21, p= 0.0288), with the mean Receiving speaking time being higher in the Hard condition than in the Easy condition.  The Giving/Receiving speaking time asymmetry had effects on the perceptual judgment task and the acoustic similarity analysis described in Chapters 3 and 4, in which only phrases from Giving steps were used due to the greater availability of material, and on the analysis of speech rate and pausing behaviour  differences in Chapters 5 and 6, in which speech rate and pause rate/percentage differences in Giving steps and in Receiving steps were considered separately.  57  2.3.3 Samples of conversations  To illustrate the types of interactions that occurred during the task – in particular, the different types and lengths of responses by Receiving partners – two samples from dyad E4 are presented below. The first sample is step eight of their interaction, in which talker 124 is the Giver and 125 is the Receiver. This step has the least amount of Receiving speech after inter-turn pauses were removed of any step in the corpus (3.1 seconds); all of 125’s utterances are single words used as backchannels (see e.g., Schegloff, 1981; Clark & Wilkes-Gibbs, 1986; Bilous & Krauss, 1988; Clark & Schaefer, 1989) to indicate to 124 that she is understanding the instructions.9                                                    9 Using the FAVE transcription conventions (Rosenfelder et al., 2011), parentheses around a word or phrase indicate an uncertain transcription, while (()) indicates an untranscribable utterance. [word]- indicates a partial word. ‘--’ indicates a restart. {BR} indicates a breath, {LG} indicates laughter, {NS} indicates noise, and {LS} indicates a lip smack. Using the Sacks et al. transcription (1974) conventions, brackets indicate overlapping speech. In addition to these conventions, ‘_whisper’ indicates a whispered word. 58  124: okay_whisper  124: alright. {BR} um {BR} now you'll take s- -- the {BR} -- hm. {BR} that orange or brown? {BR} {LG} okay {LG}  (()) I've -- gonna guess that's an orange piece so you take your two by  125:  {LG}  124 con’t: four orange piece, I think there's only one orange  piece  {BR} you put it directly on   125:                      yeah  124 con’t: top of the {BR} two by four green piece  125: okay  124: so right on top  125: mhm  124: {BR} and then you take your purple   two   by four piece and you put it right on the {BR}   125:              mhm  124 con’t: two by four {BR} yellow piece  125: okay  124: and that’s all    The second sample is step 15 of the same interaction; now 125 is the Giver, and 124 is the Receiver. In this step, 124’s contribution to the conversation includes backchannels, questions, and, particularly in the second half of the step, expansions or reformulations of 125’s instructions to clarify her understanding of what she is being asked to do.    59  125: um you grab your yellow two by -- uh one by four 124: mhm 125: and you put it on top of the light green piece. so it sh- -- that it's like on the very cen- -- it's like centred, and it's parallel to the light green 124: okay 125: and then you take your triangular grey piece and then you put it on top of the {BR} red piece on your right side. s- -- s- -- s- -- and then you put it {NS} um {NS} {BR} -- how d'you explain? {BR} so you put it at the very -- in the middle. oh. I don't know. okay so  124:         {LG} 125 cont: {BR} the {BR} long edge of the triangle {BR} is {NS} um at {NS} {BR} -- huh. it's on -- it sits on top of the {LS} red and white. {BR} um {BR} {NS} -- yeah. {LG} it's like  124:                      {BR} oh well okay s- 125 cont: o- -- in the centre 124: {BR} so so it's w- -- it's like sorta -- sort of if you like say that the really skinny skinny skinny edge, like it points to the right 125: yeah so it's like -- it's like a ship pointing 124: okay so {BR} a- -- and it's k- -- centred on the red and white? {BR} 125:         yeah 124: okay 125: so it's n- -- y- -- like you don't put -- centred as in like the long edge, so there's one on -- one empty grey on top and then 124: and one empty grey on the bottom {BR} but there's no empty on the left {BR} 125:           yeah. but -- but the side. yeah but the side is like the            same  125 cont: {LS} {LG} 124:     {BR} oh so -- but like on the left {BR} 125:              like -- okay you don't go over -- you don't go over the green part 124: okay. {BR} it just c- -- all completely on the right   60   125: yeah 124:   okay    2.4 Summary  The audio elements of the corpus described in this chapter form the basis of the exploration of the effect of task difficulty on speech convergence which is the central question of this dissertation. Personality and cognitive measures and video footage of the participants were also collected; this material will be used in future studies. The corpus participants’ speech was examined in four ways, which will be detailed in the following chapters: Chapter 3 describes the perceptual judgment task used to determine if listeners’ similarity ratings of the participants’ voices over time change in the different difficulty conditions; Chapter 4 explores whether two global acoustic measures – the amplitude envelope technique used by Lewandowski (2012) and the mel-frequency cepstral coefficient (MFCC) analysis used by Kim (2012) – will indicate convergence between the dyad partners; Chapters 5 and 6 examine whether participants converge on speech rate and on pause rate and percentage measures, respectively, and whether that convergence is affected by task difficulty.   61  Chapter 3: Task difficulty and perceived convergence 3.1 Introduction An important part of determining whether task difficulty affects convergence between talkers will be investigating whether listeners rate the vocal similarity of talkers in an easy task differently from that of talkers in a difficult task. Perceptual judgment tasks are some of the most common ways of assessing phonetic convergence (see e.g., Babel & Bulatov, 2012; Goldinger, 1998; Kim, 2012; Kim et al., 2011; Namy et al., 2002; Pardo, 2006; Pardo et al., 2010, 2012, 2013a, 2013b; Shockley et al., 2004). An advantage of using perceptual judgment tasks over analysis of acoustic properties was described by Goldinger (1998, p. 257): “Many acoustic properties can be cataloged and compared, but they may not reflect perceptual similarity between tokens – imitation is in the ear of the beholder. If imitation scores [based on acoustic-phonetic measures] miss the ‘perceptual Gestalt,’ more valid measures may come from perceptual tests”.  More recently, researchers have begun to use both perceptual judgment tasks and acoustic-phonetic measures to assess convergence; e.g., Babel & Bulatov (2012), Kim (2012), Pardo et al. (2012), Pardo et al. (2013a), and Pardo et al. (2013b). Pardo et al. (2013b) note that given the large numbers of acoustic-phonetic attributes which could be measured in any talker’s speech, it is likely that not only are talkers simultaneously converging on multiple attributes of a model talker’s or conversational partner’s speech, but they may also be converging on some attributes while at the same time diverging on others. Further, talkers might converge on different attributes for different items or different talkers (Pardo et al., 2013b, p. 184). In these instances, if only one acoustic-phonetic attribute is measured, and it is one which talkers do not converge on, then any convergence which may be occurring would be missed. However, measuring many attributes can be time-consuming and, if some attributes are converged on while 62  others are diverged from, potentially confusing. Perceptual judgment tasks are thus a holistic way to gather listeners’ assessments of  “global similarity across multidimensional aspects of acoustic-phonetic attributes simultaneously” (Pardo, 2013, p. 2), providing “a global measure of convergence that is grounded by what might be accessible to individuals during conversational interaction” (Pardo, 2013, p. 3). Following this recent multi-pronged approach to measuring convergence, as part of the examination of whether task difficulty has an effect on speech convergence, a perceptual judgment task was run in which listeners who had not taken part in the construction task described in Chapter 2 were asked to rate the similarity of the construction task participants’ voices within the dyads. The remainder of the chapter will proceed as follows: Section 3.2 presents the methods used, including the process of phrase selection (3.2.1), the experimental procedure (3.2.2), and the participants in the experiment (3.2.3). Section 3.3 presents the results of the experiment. Section 3.4 is the discussion.  3.2 Methods 3.2.1 Materials: phrase selection Each participant’s entire individual audio file was examined for phrases which could be used in a perception experiment. The criteria for a phrase to be selected were based on those used by Kim et al. (2011): (1) Phrases had to be between 500 and 1500 ms in duration. (2) Phrases had to consist of one intonational phrase or had to occur at the end of (but still contained within) an intonational phrase.  63  (3) Phrases had to be fluently produced (i.e., no disfluencies, hesitations, laughter, breaths, etc.) and free from background noise (including speech from the other partner).  (4) Phrases should be reasonably complete and easily recognizable by potential listeners outside of the larger context of the conversation; e.g., a phrase such as ‘what do you mean?’ would be included, but a phrase such as ‘what do you…?’ would not, even if it met the other criteria. Potential phrases were indicated in a separate tier in the Praat textgrid that had been created for each talker following the initial transcription phase described in Chapter 2. Following completion of the phrase selection phase, a spreadsheet was created indicating phrase content, start time, end time, duration, at what point in the conversation the phrase occurred, what step the phrase occurred in, and whether the talker was giving or receiving instructions during that step. The phrases for both partners in a pair were then compared to find potential pairs for use in the perception experiment. To be considered to be in a pair, phrases had to be (1) within 100 ms of each other in duration, to give listeners similar amounts of information about each speaker, (2) in the same third of the conversation (i.e., first third, second third, or final third), (3) both questions or both statements, to control somewhat for intonational differences, (4) both in steps where the participants were giving instructions or both in steps where the participants were receiving instructions. This choice was made with the results of Pardo (2006) and Pardo et al. (2010, 2013a) in mind; in these studies, a 64  talker’s role as giver or receiver affected whether and how much they converged to their partner).10 In the final phrase pairings, only phrases from steps in which the participants were giving instructions were used, as those phrases were more numerous for almost all participants.  Nine pairs of phrases were created for each dyad: three were taken from utterances in the first third of the conversation, three from utterances in the second third, and three from utterances in the final third. Phrase pair items were counterbalanced, so that each talker was the first speaker in one order and the second speaker in the other order.  In total, across all dyads, 270 (9 items x 2 orders x 15 dyads) trials were available for use. The full list of phrases is given in Appendix E.  3.2.2 Procedure The perceptual task design used in this study was an AX design, in which listeners are presented with two stimuli of interest and asked to judge whether or to what degree those items are similar. This is different from the design used in most previous convergence studies, which used an AXB testing procedure, in which listeners are asked to decide which of two stimuli of interest (in this case, A or B) is most similar to a set stimulus X. This approach works well given the approach of many previous convergence studies in which (1) baseline recordings of a talker’s productions of single words or phrases are made, (2) the talker is then exposed to a potential trigger for convergence (i.e., a model talker who is heard through headphones, or a                                                  10 Note that this decision, along with the choice to use an AX experiment design (described below), mean that it is not possible to determine via this study if there is asymmetrical convergence: i.e., if one talker in a dyad is converging to the other, but not vice versa. Nevertheless, the convergence of the dyad as a whole can still be assessed.  65  conversational partner), and (3) the talker is re-recorded producing the single words or phrases which they produced initially. However, in the current study, as in Kim et al. (2011), the three-step approach to measuring convergence was not used; in particular, there were no pre-exposure or post-exposure recordings made of the talkers. Instead, convergence was intended to be measured over the course of the conversation. For this reason, it was decided that using an AX design – in which only two items are compared for similarity – would be more appropriate than using an AXB design. To preserve the dynamic nature of the temporal dimension, stimuli were taken from throughout the conversation, which was divided into thirds; listeners thus heard stimuli from each dyad from each third. In piloting the experiment, it was found that listeners tended to lose focus if they were presented with all 270 trials; consequently, two versions of the experiment (AX and XA) were created which separated out the counterbalanced orders. In addition to the 135 trials in a particular order, listeners were first presented with a practice block, using nine items from one of the pairs involved in the initial construction task piloting; thus, each listener heard 144 trials. All trials were blocked by dyad; presentation of the blocks was randomized, and trial presentation was randomized within each block. A 200 ms ISI was used between the items within a trial. The experiment was administered through E-Prime 2.0 (Schneider et al., 2007). Listeners were instructed to rate the similarity of the voices in each trial on a six-point Likert scale, with 1 being ‘not similar at all’ and 6 being ‘extremely similar’. Listeners were also instructed to consider the voices in a holistic way, rather than focusing on any particular aspect of the voices. Responses were entered on the keypad of a Logitech K120 keyboard after the second item in a stimulus pair had finished playing; the response prompt timed out after 3000 ms, at which time the next trial began. Listeners received a break after each block, and pressed 66  the spacebar when they were ready to move on to the next block. Listeners heard the stimuli through AKG K240 Studio headphones. The procedure took approximately 20 minutes to complete.  3.2.3 Participants Participants were recruited via visits to UBC Linguistics classes, as well as through the UBC Psychology Graduate Student Council’s Paid Participant Studies List (http://gsc.psych.ubc.ca/studies/paid_studies.html) and through members of the UBC Speech in Context lab. Participants had to be 18 years of age or older, self-reported native speakers of English, and have no speech, language, or hearing disorders. All participants were compensated $10/hour for their time. 66 participants took part in the experiment (34 in the AX order, 32 in the XA order). Participants were randomly assigned to one of the two orders. Four participants (three in the AX order, one in the XA order) missed more than 10% of their responses and were excluded from the final analysis; thus, a total of 62 participants (31 in each order; 10 male [four in AX, six in XA]) were included in the final analysis. The average age of these participants was 22 years 233 days (SD 4 years 310 days).   3.3 Results Mean similarity ratings were calculated for each third of the conversation (first, second, final) for each condition (Easy, Medium, Hard). The higher the mean rating, the more similar the voices of the talkers in the dyad were judged to be. The similarity ratings decreased between the first (3.232, SD = 1.513) and second (3.119, SD = 1.475) thirds in the Easy condition, and then 67  increased between the second and final (3.507, SD = 1.553) thirds. In the Medium condition, the mean ratings increased slightly but steadily over time (first: 3.022, SD = 1.351; second: 3.073, SD = 1.366; final: 3.154, SD = 1.418). In the Hard condition, the mean ratings increased between the first (3.095, SD = 1.563) and second (3.171, SD = 1.481) thirds of the conversation, and then decreased between the second and final (2.977, SD = 1.454) thirds. These results are illustrated in Figure 3.1.           Figure 3.1 Mean similarity ratings by condition and conversation third. Error bars indicate ± 1 standard error of the mean.   Log-transformed similarity ratings were used as the dependent measure in a repeated-measures ANOVA, with Condition (Easy, Medium, Hard) and Conversation Third (1, 2, 3) as factors. Log-transformed ratings were used as the distribution of similarity ratings was skewed right; log transformation produced a more normal distribution. There was a main effect of Condition [F(2, 116) = 5.446, p = 0.0055], a main effect of Conversation Third [F(2, 118) = 68  4.877, p = 0.009], and an interaction between Condition and Conversation Third [F(4, 244) = 17.81, p < 0.001]11. The results were then separated by condition (Easy, Medium, Hard), and one-way ANOVAs were conducted using log-transformed similarity ratings as the dependent measure and Conversation Third (1, 2, 3) as the factor. A difference was found in the Easy condition [F(2, 2578) = 12.98, p < 0.001] , and a trend towards a difference was found in the Hard condition at a Bonferroni-corrected α-level of 0.0167 [F(2, 2755) = 4.039, p = 0.0177]. Post hoc pairwise t-tests on the log-transformed similarity ratings in the Easy condition showed that the significant differences were between Conversation Thirds 1 and 3 [t(1840) = -3.5946, p < 0.001] and between Conversation Thirds 2 and 3 [t(1838) = -4.9642, p < 0.001], where the similarity rating in the final third was higher than those in the first third and the second third.12 Note that, as can be seen in Figure 3.2 below, there was a fair amount of variability in the patterns seen in each dyad when by-dyad means were calculated. In the Easy condition, three of the five dyads were rated as being more similar in the final third than in the first third, and four of the five dyads were rated as being more similar in the final third than in the second third; however, only two of the five were rated as being more similar in the second third than in the first third. These patterns are broadly similar to those found in the by-listener analysis illustrated in Figure 3.1. In the Medium condition, three of the five dyads were rated as more similar in the                                                  11 The same effects and interactions were found when untransformed similarity ratings were used rather than log-transformed ratings. 12 At the suggestion of the external examiner, Jennifer Pardo, we also examined these data by calculating the differences between subjects’ first third and final third average ratings for each dyad, and conducting an ANOVA on these data. There was a main effect of Condition [F(2,122) = 25.27, p < 0.001]. Tukey HSD tests reveal differences between the Easy and Hard conditions (p < 0.001) and between the Medium and Hard conditions (p < 0.001); there was a trend towards a difference between the Easy and Medium conditions (p = 0.1). These results that there was a greater change in perceived similarity over time in the Easy and Medium conditions than in the Hard condition. We maintain the in-text analysis in order to preserve the temporal dynamics of the middle third, which demonstrates the complexity of the behaviour. 69  final third than in the first third, three dyads were  rated as being more similar in the second third than in the first third, and three dyads were rated as more similar in the final third than in the second third. Again, the pattern in  the by-dyad analysis is broadly similar to that found in the by-listener analysis. In the Hard c ondition, only two of the five dyads were rated as more similar in the final third than in the first third, and only one of the five was ra ted as more similar in the final third than in the second third; however, three dyads we re rated as more similar in the second third than in the first third. Once again, this by-dy ad pattern was generally similar to that of the by-listener analysis.   Mean by-dyad similarity was used as the dependent measure in a two-way ANOVA, with Condition (Easy, Medium, Hard) and Conversation Thir d (1, 2, 3) as factors. No main effects or interactions were found.  Figure 3.2 Mean similarity rating by dyad in each conversation third. Error bars indicate +/-1 standard error of the mean 70  3.4 Discussion The perceptual judgment task described in this chapter examined listeners’ ratings of holistic similarity over time of conversational partners’ voices from the construction task described in Chapter 2. Listeners heard nine pairs of utterances from each dyad: three from the first third of the conversation, three from the second third, and three from the final third. If convergence was occurring, it was expected that listeners’ ratings would be higher for pairs of utterances later in the conversation. However, if difficulty was having an effect on convergence, it was expected that the difference between listeners’ early and late ratings would vary depending on the difficulty of the condition the conversational partners were in. It was found that listeners’ similarity ratings did show different patterns depending on the difficulty condition. In the Easy condition, listeners rated voice pairings from the final third of a conversation as more similar than those in both the first third and the second third. In the Medium condition, the similarity ratings in the final third of the conversation trended higher than those in the first third and the second third. In the Hard condition, on the other hand, listeners’ ratings of talkers’ vocal similarity trended lower in the final third of the conversation than those in the first third and the second third. When by-dyad means were calculated, it was found that there was a fair amount of variability in the dyads in each condition, with some dyads showing increased similarity ratings at the same time that others showed decreased similarity ratings in a given interval. Nevertheless, overall, the by-dyad patterns followed the patterns in the by-listener analysis: four of the five dyads in the Easy condition and three of the five dyads in the Medium condition received higher similarity ratings in the final third of their conversations than in the first third, while in the Hard condition, four dyads received similar or lower similarity ratings in the final third than in the first 71  third. The consistency and reliability of these patterns could perhaps be increased, and the between-dyad variability perhaps reduced, by having a larger number of dyads in the sample. However, it could also be the case that different talkers – and thus different dyads – responded vocally to task difficulty in different ways, as suggested by the findings of Hecker et al. (1968) and Lively et al. (1993) in examining speech production under conditions of cognitive workload. The significantly higher mean similarity rating in the final conversation third in the Easy condition is unlikely to be a result of such factors as talkers having additional exposure to their partner, as the minimum and average times to task completion were longer in the Medium and Hard conditions than in the Easy condition (see Table 2.2 in Chapter 2). If more exposure were a factor in convergence, it might be expected that dyad E1, who had the longest time to completion in the Easy condition, would have had the highest similarity ratings in the final conversational third. However, as can be seem in Figure 3.2, their final-third similarity rating was in the middle of the pack of the Easy ratings, and in fact decreased slightly from the second third to the final third. As well, it is unlikely that greater lexical similarity played a role in the higher similarity ratings; i.e., it was not the case that the lexical content of the stimulus pairs in the Easy condition was more similar than that in the Hard condition. As can be seen in Appendices B.1 and B.3, the pairs of items in the final third of the Easy condition do not appear to be more lexically similar than those in the final third of the Hard condition.  These findings suggest that increased task difficulty could be having an inhibitory effect on talkers’ convergence. The explanation for this finding would be somewhat different depending on which of the explanations of convergence described in Chapter 1 one favours. In terms of the more automatic models – i.e., the perception-behaviour link posited by Bargh and colleagues (e.g., Dijksterhuis & Bargh, 2001; Chartrand & van Baaren, 2009) or the Pickering 72  and Garrod (2004a, b; 2013) models of coupled speech production and perception – the lack of convergence in this case would be due to an inhibition of the non-conscious process. This could perhaps be due to a reallocation of attentional resources, which was shown to have an effect in Abel et al. (2011), where talkers in an auditory naming task displayed different patterns of convergence to a model talker depending on what they were asked to do while listening to the model. It could also be due to some conscious factor overriding unconscious imitation.  The possibility of inhibition of the perception-behaviour connection is explicitly discussed by Dijksterhuis and Bargh, who suggest that it could occur if “passive effects of perception are dominated by currently operating goals” (Dijksterhuis & Bargh, 2001, 29). It could be argued that more effective verbal communication and/or more rapport between interlocutors could lead to greater task success (see, e.g., the Communication Accommodation Theory approach), which is a goal that would be aided by speech mimicry. However, it may be the case that attending to the perceptual input to the degree that imitation requires increases a talker’s overall cognitive workload; in this case, to complete the task at hand – i.e., successfully building the LEGO construction – it would be advantageous to ignore those aspects of the perceived input and thus not display speech mimicry.   In Pickering and Garrod (2004 a, b), a failure to align the representational channels could also be an intentional choice: for example, if the high-level task goals override a talker’s low-level alignment to their interlocutor’s previous productions (Pickering & Garrod, 2004b). However, Pickering and Garrod (2004b) suggest that such inhibition will be more difficult for a talker than simply aligning; in a case in which the goal of effective joint action – successful dialogic communication – is facilitated by alignment, and in which cognitive workload is already high, this would seem to be counterproductive. It is perhaps more likely that the reallocation of 73  attentional resources is unintentional; however, exactly how this would happen is a subject for further study. In Pickering and Garrod’s (2013) approach, a talker who does not align may not be generating the appropriate forward models to allow for prediction of an interlocutor’s speech actions, again perhaps due to difficulty-induced cognitive workload.  In terms of Communication Accommodation Theory, converging to an interlocutor is thought to improve the effectiveness of communication (see e.g., Thakerar et al., 1982; Giles & Coupland, 1991; Giles et al., 1991; Giles & Ogay, 2007); thus, in a more difficult task, one might expect talkers to converge to increase their likelihood of success in the task. In the current study, none of the dyads in the construction task were told which condition they were in, or even that there were different conditions; thus, no participants began the task anticipating that they would need to increase their communicative effectiveness. If that need instead emerged as part of the task, it would be expected that the dyads in the Hard condition would have displayed more convergence – i.e., would have received higher similarity ratings later in their conversations – than those in the easier conditions; however, that was not the case. Another possibility under a CAT approach is the attitudes the talkers had towards the task or to each other. The non-conversational convergence study from Abrego-Collier et al. (2011) found that having a negative attitude towards a model talker caused participants to diverge from the model. Such a tendency may come into play if partners become frustrated with each other, which is perhaps more likely to occur in more difficult conditions. It was not obvious from the audio recordings that any of the participants became frustrated with their partners; examination of the video recordings may prove useful in exploring this possibility further, as participants may have expressed frustration through gestures or facial expression rather than through words. At this point, then, it is not clear 74  how the lack of convergence seen here would be explained under a CAT approach; further research would be necessary to explore this possibility. The results of this perception experiment suggest that difficulty could be affecting convergence, but they do not indicate which particular acoustic-phonetic aspects of speech talkers are converging on in the Easy and Medium conditions and not in the Hard condition. To further explore this phenomenon, in Chapter 4, global acoustic measures of similarity based on amplitude envelopes (Lewandowski, 2012; Hall et al., 2014) and on mel-frequency cepstral coefficients (Kim, 2012; Hall et al., 2014) will be used to determine if acoustic similarity ratings follow the same pattern as the perceived similarity ratings. In Chapters 5 and 6, talkers’ speech rates and pausing patterns, respectively, will be examined for evidence of convergence.  75  Chapter 4: Task difficulty and global acoustic measures of convergence 4.1 Introduction  While listeners’ assessments of talkers’ increasing similarity over time remain the gold standard for exploring speech convergence generally, researchers with an interest in acoustics and phonetics continue to look extensively at exactly which acoustic-phonetic speech features talkers may be converging on when they display this type of behaviour. Previously, a number of individual features have been measured for signs of convergence, including vocal intensity (Natale, 1975a), vowel quality (Babel 2010, 2012), word duration (Abel et al., 2011), fundamental frequency (Babel & Bulatov, 2012; also see Gregory, 1990; Gregory et al., 1993; Gregory & Webster, 1996), and voice onset time (Abrego-Collier et al., 2011; Nielsen, 2011; Shockley et al., 2004). However, while focusing on an individual acoustic measure may yield evidence of convergence, it can also be a rather hit-and-miss affair: talkers may be becoming more similar on other measures that would be missed by a narrow line of inquiry. As mentioned in Chapter 3, researchers have recently begun to use both perceptual judgment tasks and acoustic-phonetic measures to assess convergence; e.g., Babel & Bulatov (2012), Babel et al. (2013), Kim (2012), and Pardo et al. (2012, 2013a, 2013b). While these multi-pronged analyses better represent the overall picture of convergence, the acoustic-phonetic aspect of the analysis tends to focus on single features (e.g., fundamental frequency in Babel & Bulatov, 2012; Pillai scores of how merged vowels are in Babel et al., 2013) or clusters of single features (e.g., word/phrase duration and vowel measures in Pardo et al., 2012, 2013b; voice onset time and vowel measures in the monosyllabic word analysis in Kim, 2012).  It is also possible to examine more ‘global’ acoustic measures for evidence of convergence. The long-term average spectra (LTAS) measure used by Gregory and colleagues 76  (Gregory 1990; Gregory et al., 1993; Gregory & Webster, 1996) is one such example. However, in their analysis, they focused on the lower end of the LTAS, as their primary interest was in fundamental frequency; thus, any convergence at the higher end of the spectrum was missed. As well, to be successful, LTAS analysis requires measurement of longer stretches of speech – anywhere from 30 to 90 seconds is generally thought ideal (see e.g., Byrne et al., 1994; Kitzing, 1986; Klingholtz, 1990, inter alia) – which is either impractical or impossible in the types of word-based immediate shadowing tasks which are frequently used in convergence research. More recent work has used different types of global acoustic measures to assess speech convergence: Kim (2012) used mel-frequency cepstral coefficients (MFCCs) and a dynamic time warping algorithm, Lewandowski (2012) used amplitude envelope measurement and time-series cross-correlation, and Babel et al. (2014) used both of these measures. Unlike LTAS, these measures are used at a word or sentence level to assess convergence globally, without focusing solely on a single feature; they are intended to be “representations that more faithfully encode the speech signal as it unfolds over time without making specific assumptions about what types of cues might be extracted or which regions of the signal are the most important” (Wade et al., 2010, pp. 231-232).   In this chapter, as part of the examination of whether task difficulty has an effect on speech convergence, acoustic similarity analyses of the construction task participants’ voices are presented. In particular, acoustic similarity values are measured using both amplitude envelopes and MFCCs, and the changes in those values over time are examined for evidence of convergence. The remainder of the chapter will proceed as follows: Section 4.2 presents the methods used; Section 4.3 presents the results of the experiments; Section 4.4 is the discussion.   77  4.2 Methods 4.2.1 Materials  The basis for the acoustic similarity analysis was the nine pairs of phrases for each dyad for the perceptual judgment task described in Chapter 3. As the use of both perceptual and acoustic measures in assessing speech convergence is intended to provide “a global measure of convergence that is grounded by what might be accessible to individuals during conversational interaction” (Pardo, 2013, p. 3), it is both a logical approach and an emerging common practice (Babel & Bulatov, 2012; Babel et al., 2013, 2014; Kim, 2012; Pardo et al., 2012, 2013) to use the same material in both types of analyses. As described in Chapter 3, the phrase stimuli included three pairs of utterances from the first third of each dyad’s conversation, three pairs of utterances from the second third, and three pairs of utterances from the final third. All utterances were taken from steps in which talkers were giving instructions. (See section 3.2.1 for a full description of the selection of these pairs of utterances, and Appendix E for a full list of the content of these utterances.)   4.2.2 Procedures  Using the Phonological CorpusTools (PCT) software package (Hall et al., 2014), the 135 pairs of utterances (15 dyads x 9 utterance pairs/dyad) were subjected to both amplitude envelope and MFCC analysis. Prior to either kind of analysis, PCT preprocesses the audio files of the utterances: the waveform of the utterance is pre-emphasized, both to give a flatter spectrum and to correct for the higher amplitude drop off in the higher frequencies.   The amplitude envelope calculation procedure in PCT was developed following the method described in Lewandowski (2012). The acoustic signal is filtered into a number of 78  logarithmically spaced bands using 4th order Butterworth bandpass filters; in this analysis, eight filter bands were used, with the minimum frequency set at 80 Hz (the typical low end of the human vocal range) and the maximum frequency set at 7800 Hz (the default settings in the PCT acoustic similarity analysis, based on a typical 16kHz sampling rate for speech signals). The amplitude envelope itself is calculated by converting the acoustic signal to an envelope function via a Hilbert transform; the envelope is then downsampled to 120 Hz. Finally, in the envelope matching step, the envelopes for a pair of utterances are submitted to time-series cross-correlation analyses. This method is used to compare two series of data points x and y measured at regular time intervals by shifting the points – or, in this case, the peaks and valleys of the acoustic envelope – relative to each other to find the best match between the two series. In this analysis, each of the eight frequency bands in the first member of a pair of utterances was cross-correlated with the corresponding frequency band in the second member of the utterance pair (i.e., first band with first band, second band with second band, and so on). The time series were normalized to sum to 1; completely matching signals would receive a cross-correlation value of 1, and completely opposite signals would receive a cross-correlation value of 0. The acoustic similarity value output was thus a value between 0 and 1; in Lewandowski (2012), similarity values ranged between 0.63 and 0.89.  In the MFCC analysis, the acoustic waveform is initially windowed and then Fourier-transformed into the linear frequency domain. Triangular filters are then constructed on a mel scale, which is a scale based on perceived intervals between pitches rather than absolute frequency in Hertz (Stevens et al., 1937) and thus gives greater weight to lower frequency elements of the spectrum. In this analysis, 26 filters were applied to the Fourier-transformed spectrum (the number determined by PCT to be optimal for the analysis), and the spectrum was 79  then represented as the log of the power in each of the filters. The mel-frequency cepstrum (a Fourier transform on the logarithm of the power spectrum of the signal; Childers et al., 1977) is then calculated using a discrete cosine transform (DCT), and a number of orthogonal coefficients – in this analysis, 12 – are returned. The MFCCs are then compared via a dynamic time warping (DTW) algorithm, which calculates the lowest-cost path through a distance matrix independent of time, and returns a minimum distance between the two signals. In this analysis, PCT was asked to output the distance value as a similarity value to facilitate comparison with the amplitude envelope results.   4.3 Results 4.3.1 Amplitude envelope measurement results  Mean similarity values of the amplitude envelopes for the utterances in each pair were calculated for each third of the conversation (first, second, final) within each condition (Easy, Medium, Hard) to determine whether convergence was occurring and, if so, whether it was being affected by task difficulty. The higher the mean values, the more similar the amplitude envelopes of the utterances are, and the more similar the spectral characteristics of the utterances will be.   Looking at each conversation third in each condition in Figure 4.1 below, no clear patterns emerge. In the Easy condition, the mean similarity value decreased from the first third (0.481, SD = 0.094) to the second third (0.429, SD = 0.07), and increased from the second third to the final third (0.465, SD = 0.073). In the Medium condition, the mean similarity value showed the opposite pattern, increasing from the first third (0.479, SD = 0.094) to the second third (0.516, SD = 0.074) and decreasing from the second third to the final third (0.478, SD = 0.106). In the Hard condition, the mean similarity value changed very little, with a slight increase 80  from the first third (0.46, SD = 0.107) to th e second third (0.47, SD  = 0.041) and a slight decrease from the second third to the final third (0.459, SD = 0.089).                  There was a fair amount of variability in the mean amplitude envelope similarity values when the values were examined by dyad, as sh own in Figure 4.2. In the Easy condition, four of the five dyads showed a decrease in the mean similarity values between the first and second thirds, and an increase in the mean similarity va lues between the second and final thirds (i.e., the same pattern as in the Easy condition overall). In the Medium condition, three of the five dyads showed an increase in the mean similarity values between the first and second thirds, and a decrease in the mean similarity values between the second and final thirds (again, the same pattern as in the Medium condition overall). Th e between-dyad variability  was greatest in the Figure 4.1 Mean amplitude envelope similarity values by condition and conversation third. Error bars indicate +/1 standard error of  the mean. Higher values indicate greater mean similarity between the amplitude envelopes in each utterance pair. 81  Hard condition: two dyads showed an increase in similarity values between the first and second thirds and a decrease in similarity values between the second and final thirds; one dyad showed the opposite pattern (decrease be tween the first and second thirds and increase between the second and final thirds); one dyad showed a consis tent increase; and one dyad’s values remained relatively stable throughout the conversation.      Amplitude envelope similarity was used as the dependent measure in a repeated-measures ANOVA, with Condition (Easy, Medium , Hard) and Conversation Third (1, 2, 3) as factors. No main effects or interactions were found.  Figure 4.2 Mean amplitude envelope similarity by dyad. Error bars indicate +/-1 standa rd error of the mean. Higher values indicate greater mean similarity between the amplitude envelopes in each utterance pair. 82  4.3.2 MFCC results  As with the amplitude envelopes, mean similarity values of the MFCCs for the utterances in each pair were calculated for each third of the conversation (first, second, final) within each condition (Easy, Medium, Hard) to determine whether convergence was occurring and, if so, whether it was being affected by task difficulty. The higher the mean values, the more similar the MFCCs of the utterances are, and the more similar the spectral characteristics of the utterances will be.  Overall, the MFCC-based similarity values were much lower than those found in the amplitude envelope analysis. The changes over time in each condition were minimal, as illustrated in Figure 4.3. In the Easy condition, the mean MFCC similarity value decreased between the first (0.0187, SD = 0.0014) and second (0.0183, SD = 0.0019) thirds of the conversation, and increased very slightly between the second and final (0.0184, SD = 0.0021) thirds; this was the same general pattern seen in the amplitude envelope analysis, albeit to a lesser degree. The Medium condition also showed the same general pattern as in the amplitude envelope analysis, with the mean value increasing from the first third of the conversation (0.0196, SD = 0.0015) to the second third (0.0198, SD = 0.002), and decreasing from the second third to the final third (0.0194, SD = 0.0022). In the Hard condition, the mean MFCC similarity value increased very slightly between the first (0.0186, SD = 0.0022) and second (0.0187, SD = 0.0024) thirds, and slightly more between the second and final (0.019, SD = 0.0016) thirds. 83    Examining the variability in MFCC similarity values over time by dyad, a wider range of patterns was found in the Easy condition than was found using the amplitude envelope measurements: two dyads showed the same pattern as the overall Easy results (decrease between first and second thirds and increase between second and final thirds), one dyad showed the opposite pattern, and two dyads decreased in both intervals. In the Medium condition, three dyads’ MFCC values followed the overall pattern of increasing between the first and second thirds and decreasing between the second and final thirds, one dyad showed the opposite pattern, and one dyad increased very slightly in both intervals. In the Hard condition, there was more variation in the dyads than in the overall results: two dyads decreased and then increased, two dyads increased and then decreased, and one dyad increased steadily over time. These results are illustrated in Figure 4.4.   Figure 4.3 Mean MFCC similarity by condition and conversation third. Error bars indicate +/-1 standard error of the mean. Higher values indicate greater mean similarity between the MFCCs in each utterance pair. 84    MFCC similarity was used as the dependent measure in a repeated-measures ANOVA, with Condition (Easy, Medium, Hard) and Conversat ion Third (1, 2, 3) as factors. No main effects or interactions were found.   4.4  Discussion The acoustic similarity analysis described in this chapter examined the similarity in amplitude envelopes (Lewandowski, 2012; Ha ll et al., 2014) and mel-frequency cepstral coefficients (MFCCs) (Kim, 2012; Hall et al., 2014) over time of utterances from the dyads in the construction task described in Chapter 2. The utterances were the nine pairs of utterances from each dyad used in the perceptual judgment ta sk in Chapter 3: three from the first third of the conversation, three from the s econd third, and three from the fi nal third. If convergence was Figure 4.4 Mean MFCC similarity by dyad. Error bars indicate +/-1 standa rd error of the mean. Higher values indicate greater mean similarity between the MFCCs in each utterance pair. 85  occurring, it was expected that the similarity values would be higher for pairs of utterances later in the conversation. However, if difficulty was having an effect on convergence, it was expected that the similarity values would vary depending on the difficulty of the condition the conversational partners were in. It was found that neither amplitude envelope similarity values nor MFCC values showed reliably or consistently different patterns in any of the conditions when examined over time.  As mentioned above, there was a fair amount of variability in the dyads in each condition, with some dyads showing increased similarity values at the same time that others showed decreased similarity ratings in a given interval. This variability could perhaps be reduced by having a larger number of dyads in the sample; however, it could also be reflective of factors other than difficulty playing a role in vocal convergence, such as personality differences (Yu et al., 2013), the distance between the talkers’ native dialects of Canadian English (Kim et al., 2011; Kim, 2012), or how much attention the talkers were paying to each other versus the task at any given time (Abel et al., 2011; see also Mattys et al., 2009, 2014, and Mattys & Wiget, 2011). It could also be the case that different talkers have different vocal responses to the difficulty of the condition, which could then modulate any tendency towards increasing spectral similarity which might otherwise occur. For example, while some previous studies have shown that talkers’ fundamental frequency (f0) will consistently increase under conditions of increased cognitive workload (e.g., Brenner et al., 1994; Griffin & Williams, 1987; Huttunen et al., 2011), others have shown that only some talkers show an increase, while others maintain their f0 or even decrease it under increased load (e.g., Lively et al., 1993; Scherer et al., 2002). Hecker et al. (1968) noted that there was no consistent pattern of spectral change in high-workload conditions across their sample of talkers, but that changes were generally consistent within talkers. Thus, 86  without knowing how a talker’s speech behaviour changes under workload without a partner present, it may be difficult to know what changes to look for with a partner present. From a speech perception point of view, it is possible that the lack of reliable increases in similarity over time could be due to the attentional resource reallocation and consequent re-weighting of speech cues under conditions of cognitive workload proposed by Mattys and colleagues (Mattys et al., 2009, 2014; Mattys & Wiget, 2011). Under this assumption, talkers perceiving speech in high-load situations give more weight to the lexical-semantic information in the speech stream than they do to the acoustic-phonetic information, due to the higher communicative value of those lexical-semantic cues (Mattys et al., 2005). This re-weighting of cues could conceivably be sufficient for talkers to not take in enough detailed acoustic-phonetic information to reliably show increased spectral similarity to their partners. Given that speech production and speech perception changes will likely both apply in conditions of workload in conversation – talkers will both have to speak and to perceive in order to participate in the conversation – it may even be a combination of these changes leading to a lack of convergence. While no reliable pattern of increasing spectral similarity was found, there was nevertheless a weak positive correlation between the ratings given by listeners to the pairs of utterances in the perceptual judgment task and the similarity values returned by the acoustic analyses: for the amplitude envelope analysis, the correlation was r(133) = .18, p = 0.037, while for the MFCC analysis, it was r(133) = .178, p = 0.039.13 Thus, pairs of utterances which received higher ratings for similarity were somewhat more likely to also have higher acoustic similarity values, and pairs of utterances which were given lower similarity ratings were                                                  13 The correlation between the two acoustic similarity measures was very similar to the correlations between each measure and the similarity ratings: r(133) = .177, p = 0.039. 87  somewhat more likely to have lower acoustic similarity values. This suggests that acoustic similarity may have contributed to some degree to the listeners’ ratings of the pairs of utterances, but that it was not the only factor in those ratings. As mentioned earlier, it has been suggested that listener judgments of similarity reflect “global similarity across multidimensional aspects of acoustic-phonetic attributes simultaneously” (Pardo, 2013, p. 2, emphasis JA), providing “a global measure of convergence that is grounded by what might be accessible to individuals during conversational interaction” (Pardo, 2013, p. 3, emphasis JA). Recall from section 3.2.2 that listeners were asked to rate the similarity of the pairs of voices they heard holistically, which means they were not limited to using only global spectral similarity in making their ratings, but could also recruit factors such as speech rate, prosodic features, or dialectal differences in making their judgments. Listeners were also not restricted in how they weighted these factors in making their decisions, or in how to interpret the instruction to judge ‘similarity’: for example, if two utterances had similar spectral characteristics but different speech rates, this could result in a high similarity rating from one listener but a low similarity rating from another. In terms of the spectrum itself, listeners could differently weight the various elements that would make up the global measure: not only fundamental frequency or formant values, but voice quality (creakiness, breathiness), overall amplitude (loudness), or variability in amplitude or fundamental frequency. If creakiness, for example, was particularly relevant to a listener, then they may have rated a talker with a creaky voice and a talker with a breathier voice as very different even if the other elements which were combined in the global acoustic similarity measure were similar. As well, recall that the listeners heard all nine pairs of utterances from each dyad in a block (in a randomized order). As their familiarity with the voices increased, their judgments may have evolved; no such familiarity was developed by the PCT algorithms. The listeners in the 88  experiment described in Chapter 3 thus may have been using acoustic similarity based on spectral measurements of the type made by amplitude envelopes and MFCCs in their similarity judgments, but only as one aspect of a more sophisticated analysis involving all of what is accessible to them, and what is relevant to them, when listening to conversations as they unfold over time. Overall, four of the dyads studied showed the same patterns of similarity changes in listener judgments, amplitude envelope measures, and MFCC measures: i.e., listener ratings of similarity and acoustic similarity values increased and decreased in the same intervals, although the magnitude of the change was not necessarily the same. Two of these dyads were in the Hard condition (H3 and H4), one was in the Medium condition (M4), and one was in the Easy condition (E3). It is thus also possible that spectral similarity may have been played more of a role in some listeners’ similarity ratings than it did for others. It is possible that the material used in the spectral similarity analyses may have led to the lack of a reliable pattern of global acoustic convergence. As detailed in section 3.2.1, the utterances selected for the perceptual judgment task (which were then used in this analysis) were not selected for their lexical content; rather, they were selected based on factors such as phrasal constituency, length, and fluent production. (Appendix E contains the full list of phrases.) Lewandowski’s (2012) analysis, which found convergence using amplitude envelope analysis, compared instances of the same word taken from pairs of talkers at different points in the Diapix task and in the reading list the talkers were asked to produce. Kim’s (2012) MFCC analysis, which found convergence between talkers with different native languages and divergence between talkers with the same native language and dialect, used the same sentences uttered by the different talkers as its basis for comparison. It is likely that amplitude envelope or MFCC 89  similarity measures are best suited for detecting convergence in utterances containing the same lexical items, given their reliance on spectral characteristics. Early in the course of this study, an attempt was made to use single words from the conversations as material for an amplitude envelope analysis, to more closely follow Lewandowski’s (2012) analytic procedure. However, it became evident that due to the need for fluently produced tokens with minimal background noise produced in roughly the same position in a sentence in both early and late portions of the conversation for both talkers (see the conditions for token inclusion listed in Chapter 4 of Lewandowski, 2012), not enough usable tokens would be available in the shorter conversations to make this a feasible analysis technique. It may be the case that more restricted conversational contexts than were found in the current study are better suited for these kinds of acoustic convergence analysis.  In Chapters 5 and 6, we turn from global measures of similarity to more specific measures; in particular, whether convergence in talkers’ speech rates and pausing patterns is affected by task difficulty.    90  Chapter 5: Task difficulty and speech rate convergence 5.1 Introduction Changes in talkers’ speech rate have been investigated in relation both to speech convergence and to speech production under conditions of cognitive workload. These investigations have sometimes looked directly at units of speech per unit of time, where producing more units in a given amount of time leads to a higher speech rate, and sometimes looked at global decreases in phrase, word, syllable, or segmental duration, where shortening the length of the units under consideration will allow more of them to be produced in a given amount of time. All the studies described in this chapter which looked at speech unit duration rather than speech rate described the observed decrease in speech unit duration as an increase in speech rate.  Changes in speech rate have been reported in a number of studies examining the effect of task difficulty on speech production, including Griffin & Williams (1987), Lively et al. (1993), Brenner et al. (1994), and Scherer et al. (2002). Using a variety of measures, including syllables per second (Brenner et al., 1994), word duration (Griffin &Williams, 1987), syllable duration (Scherer et al., 2002) and segmental and phrase durations (Lively et al., 1993), task difficulty has typically been found to increase speech rate when a talker is producing speech while performing a task. The cause of this increased speech rate has not been clearly identified. Brenner et al. (1994) imply that it is related to physiological changes (e.g., increased heart rate) resulting from psychological stress due to increased cognitive workload. Lively et al. (1993), on the other hand, suggest that talkers engage in what Lindblom (1990) describes as ‘hyperspeech’ in order to better function in high cognitive workload conditions; in this interpretation, decreased word duration allows talkers to “return to their attention to the workload task” (Lively et al., 1993, p. 2963).  91  The degree to which talkers’ speech rate was affected by more difficult tasks varied across the studies mentioned above, from a decrease in segmental and phrase duration for 4 of 5 talkers in Lively et al. (1993), to an increase in syllables per second by 13 of 17 speakers in Brenner et al. (1994), to decreases in word duration and syllable duration, respectively, in Griffin and Williams (1987) and Scherer et al. (2002). However, these talkers were not interacting with another talker while performing the task: rather, they were reciting sequences of numbers (0-9 in Griffin & Williams, 1987; 90-100 in Brenner et al., 1994) or repeating short phrases when prompted (‘say [hVd] again’ in Lively et al., 1993; ‘this is task number ____’ in Scherer et al., 2002).  Convergence in speech rate has been also examined in both the accommodation literature and the acoustic-phonetic sphere. While convergence has been found to occur in these domains, it is does not always happen. Street Jr. (1982) found that listeners positively evaluated those talkers who reduced their words-per-minute speech rate to become more similar to that of an interviewer, and negatively evaluated those talkers whose speech rate increased and thus diverged from that of the interviewer. Jungers and Hupp (2009) demonstrated that listeners will converge to the fast or slow speech rate of a model talker when asked to either repeat the sentences produced by the model or to describe pictures spontaneously after having listened to the model. Two studies by Pardo and colleagues (Pardo et al., 2010, 2013a) show mixed results as to whether convergence occurred in speech rate. These studies were based on Wilson and Wilson’s (2005) proposal that conversational turn-taking is best modeled as a type of oscillation, and supposed that “[a] talker who intends to continue the conversation smoothly (without a prolonged delay) must be able to anticipate a turn transition point, which is governed by a number of factors, including articulation rate. The demand for close attention to an interlocutor’s 92  articulation rate should lead naturally to entrainment on rate” (Pardo et al., 2010, pp. 2255-2256). However, in Pardo et al. (2010), where Givers and Receivers of instructions in the map task remained in their roles throughout the entire task, talkers were not found to consistently converge or diverge on speech rates, as measured using time-series cross-correlation of words per second rates in one-minute intervals through the task. In Pardo et al. (2013a), where talkers switched between Giving and Receiving roles during the task, different results were found depending on whether measurements were made at the global level or at the local level. Global convergence on speech rates was found when talkers switched roles: i.e., when the original Givers became Receivers, and the original Receivers became Givers, their speech rates converged (Givers became slower and Receivers became faster). When talkers returned to their original roles, their speech rates diverged on a global level: Givers became faster and Receivers became slower. However, when convergence was measured in the same way as in Pardo et al. (2010) – i.e., time-series cross-correlation of words per second at one-minute intervals throughout the task – no consistent pattern of speech rate change was found for six of the eight pairs of talkers, and the two pairs which did show a consistent pattern diverged. In a study in which talkers did a task twice with different partners each time, Levitan and Hirschberg (2011) found that talkers were more similar to their task partners in terms of speech rate than they were to participants with whom they had not done the task, but they were even more similar to themselves when their two sessions were compared than they were to their partners in those sessions. These results suggest that talkers are “not automatically yoked to reproduce the temporal kinematics of an interacting partner, even in a collaborative task” (Pardo et al., 2013, p. 290), and that although speakers will modify their speech rate to coordinate with a partner’s, they also “tend to adhere to personal speaking behavior that carries across conversations” (Levitan & Hirschberg, 2011, p. 3183). 93  Given that task difficulty has been shown to have an effect on speech rate, and given that talkers have displayed a variety of convergence and divergence patterns when it comes to speech rate, this chapter explores the question of whether task difficulty has an effect on speech rate convergence, by examining whether the talkers in the construction task corpus (a) showed changes in their speech rate based on the difficulty condition they were in, (b) became more similar in their speech rates over the course of their interaction, and (c) if they became more similar, whether the difficulty condition had an effect on that similarity. Increased similarity over the course of a dyad’s interaction could indicate that they were converging on speech rate, and an effect of condition could indicate that task difficulty was affecting that convergence. Section 5.2 describes the methods used in this examination; Section 5.3 presents the results; Section 5.4 discusses the results.  5.2 Methods 5.2.1 Measuring speaking and articulation rate differences A variety of techniques have been used to measure speech rate, including words per minute (Street Jr., 1982), words per second (Pardo et al., 2010; Pardo et al., 2013a), syllables per second (Brenner et al., 1994; Levitan & Hirschberg, 2011), word duration (Griffin &Williams, 1987), syllable duration (Scherer et al., 2002) and segmental and phrase durations (Lively et al., 1993). Given the conversational nature of the speech in the construction task corpus – which featured a wide variety of words and phrases, and in which speakers extensively reduced the duration and quality of words, syllables, and segments – words per second and syllables per second values were used for measuring speech rate in this dissertation. As well, because of the noisy nature of the recordings in the corpus – in particular, noise from talkers’ overlapping 94  speech and laughter, and noise from the manipulation of LEGO pieces – words per second and syllables per second were calculated based on the orthographic transcription of each talker’s speech, rather than via an automatic method such as amplitude peak-picking.14 While words per second and words per minute are the measures used previously in much convergence research (Pardo et al., 2010; Pardo et al., 2013a; Street Jr., 1982), these can be somewhat misleading, as they give the same weight to single-syllable words like ‘the’ as they do to multi-syllabic words like ‘perpendicularly’, both of which appear in the corpus under consideration. For this reason, both words per second and syllables per second were used in this dissertation. Words per second and syllables per second values were calculated for each talker in each Giving and each Receiving step of their conversation; speech from the opening and closing sections (see Section 2.3.1 in Chapter 2 for a description of these sections) was not included. For each step, a list of all words and non-word material and the duration of those elements was generated using Praat and ELAN. All inter-turn pauses were removed from the list, providing a total amount of speaking time for each step. Two different ways of calculating words per second and syllables per second rates were used. The first used all turn time – i.e., all time that a talker used for turns in a particular step, not including inter-turn pauses. This included all complete words; all filled and unfilled pauses; all laughter, coughs, and noises; and any other material in the step which was speech-like but not counted as a complete word (see further details below). The resulting word and syllable rates using this time calculation will be called ‘speaking rates’ in this chapter. The second method used only the time required to articulate the words, as well as any unfilled pauses and breaths less than 250 ms in length, which were taken to be part of the                                                  14 Orthography-based calculation was the method used by Pardo et al. (2010, 2013a) as well; while it is less reflective of ‘real life’ physiological articulation than the results of an amplitude peak-picking analysis would be, it nevertheless is a reasonable approximation of talkers’ speech rate. 95  articulation process (see the discussion in Grosjean & Lane, 1976, and Miller & Grosjean, 1981). The resulting words/syllables per second rates are referred to as ‘articulation rates’. While articulation rate was the measure used by Pardo et al. (2010, 2013a), it is not necessarily reflective of ‘real-life’ speech rates: the values are generally higher than would be the case if all of the time used for all elements of a talker’s speech were included. As well, a total turn time-based speaking rate measure allows for a comparison of word and syllable rates with rates of other elements in speech, such as the rates of filled and unfilled pauses which will be explored in Chapter 6. In this study, both the articulation rate  and the speaking rate were measured, both in terms of units per second. The term ‘speech rate’ is used to refer to speaking rate and articulation rate collectively. In determining the number of words, all words were included except for those   which were untranscribable or uncertainly transcribed. Untranscribable or uncertainly transcribed material was found in 56.75% (227 of 400) of the steps. In 149 of the steps (65.6%) with untranscribable material, the duration of that material comprised less than 1% of the total speaking time in the step; the highest percentage was 7.6%, in one Receiving step in dyad M3’s interaction.  which were incomplete. Partial words were found in 72.5% (290 of 400) of the steps. In 184 of the steps (63.4%) with partial words, the duration of those words comprised less than 1% of the total speaking time in the step; the highest percentage was 11.1%, in one Receiving step in dyad E5’s interaction.  which were laughed through, sung through, or yawned through, as their durations were often extraordinarily elongated. Words of this type were found in 50.5% (202 of 400) of the steps. In 73 of the steps (36.1%) with laughed-through, sung-96  through, or yawned-through words, the duration of those words comprised less than 1% of the total speaking time in the step; the highest percentage was 10.9%, in one Receiving step in dyad M4’s interaction. While these elements were not counted as words, the time used to produce them was included in the calculations of speaking time. In terms of syllables per second measures, the syllabification of the words in the corpus was completed using the CMU pronouncing dictionary, version 0.7a (2008; http://svn.code.sf.net/p/cmusphinx/code/trunk/cmudict/cmudict.0.7a, retrieved June 20, 2014). Where no syllabification was listed, the word was syllabified based on the most similar words which were in the dictionary (e.g., ‘bajillion’ was syllabified based on the CMU syllabification of ‘million’), on the syllabification of the components of compound words which were in the dictionary (e.g., ‘leftmost’ was syllabified using the components ‘left’ and ‘most’), and/or on the CMU syllabification of the word plus the syllable count of any affixes which had been added to it (e.g., ‘lilac-y’). A list of words which did not appear in the CMU dictionary and the syllabifications of them used in this dissertation is given in Appendix F. Words per second and syllables per second were used to measure whether difficulty had an effect on speaking and/or articulation rate; based on the workload literature, the expectation was that words/syllables per second would be higher in the more difficult conditions. Mean speech rate and articulation rate for each talker was used to determine whether there were differences in speech rate or articulation rate between the three conditions. Then, the absolute differences between words per second values and between syllables per second values for dyad partners early and late in the conversation were measured; if talkers were becoming more similar 97  in their rates, these differences should be smaller later in the conversation than they were earlier in the conversation. These absolute differences were measured in three ways:   in each step; i.e., between one talker’s Giving words/syllables per second and the other talker’s Receiving words/syllables per second for Step One, Step Two, etc. This is similar to the method used in Pardo et al. (2010, 2013a), in which Givers’ speech rates were compared to those of Receivers.  between the words per second and syllables per second values in the Giving steps for each partner. In this instance, the talkers’ Giving steps were paired in order of occurrence. Thus, the first pairing would have Talker 1’s Giving words/syllables per second in Step 1 (their first Giving step) and Talker 2’s Giving words/syllables per second in Step 2 (their first Giving step); the second pairing would have Talker 1’s Giving words/syllables per second in Step 3 (their second Giving step) and Talker 2’s Giving words/syllables per second in Step 4 (their second Giving step); and so forth.  between the words per second and syllables per second values in the Receiving steps for each partner, in the same way as for the Giving steps. These differences were measured for each step in the first instance, for each Giving step pairing in the second instance, and for each Receiving step pairing in the third instance. However, as each condition had a different number of steps – i.e., 18 (nine Giving and nine Receiving for each partner) in the Easy condition, 12 (six Giving and six Receiving for each partner) in the Medium condition, and 10 (five Giving and five Receiving for each partner) in the Hard condition – not all of the steps were used in the analysis. In the by-step words/syllables per second difference analyses, the first four steps (Early) and the last four steps (Late) for each dyad 98  were used; in the Giving and Receiving difference analyses, the first two same-type pairings (Early) and the last two same-type pairings (Late) for each dyad were used.  5.2.2 Measuring cross-correlation in speech and articulation rates  Following the method used by Pardo et al. (2010, 2013a), the words per second and syllables per second values for the talkers in each dyad, for both speaking rate and articulation rate, were submitted to time-series cross-correlation analyses in addition to the analysis of rate differences. Time-series cross-correlation is used to compare two series of data points x and y measured at regular time intervals. In many systems, it is often the case that the best correlations of data sampled in this way is found if one series is time-shifted relative to the other. These time shifts are indicated by the ‘lag’ value: a negative lag indicates that the value of x at time t is correlated with the value of y at some later time, while a positive lag indicates that the value of x at time t is correlated with the value of y at some earlier time. For example, in this study, consider the speech rate value for each talker in a dyad for each step of their task. If the speech rate value of partner x is significantly correlated with that of partner y at a lag of -3, this means that the speech rate value of partner x in Step 1 would be correlated with that of partner y in Step 4, x in Step 2 with y in Step 5, et cetera. On the other hand, if the speech rate of x is significantly correlated with that of partner y at a lag of 3, this means that the speech rate of partner x in Step 4 would be correlated with that of partner y in Step 1, x in Step 5 with y in Step 2, and so on. A negative lag is sometimes described as x leading y, and a positive lag as x lagging y (see e.g., Shumway & Stoffer, 2014). As in any correlation, coefficients can be positive or negative; Pardo et al. (2010, 2013a) interpreted significant positive coefficients as indicative of convergence and negative coefficients as indicative of divergence. In the current study, Giver 1 was x in the 99  analysis of each step and in the analysis of Giving steps only, and Giver 2 (who is also Receiver 1) was x in the analysis of Receiving steps only.    5.3 Results 5.3.1 Overall speaking and articulation rates To determine if difficulty had an effect on participants’ speaking and articulation rates, the overall rates in each condition were measured in both words per second and syllables per second.  5.3.1.1 Words per second Mean words per second rates based on by-talker means, for both speaking rate and articulation rate, were examined by Condition and Type of step. Overall, words per second speaking rate values were higher in the Receiving steps than in the Giving steps in all three conditions. The mean words per second speaking rate in Giving steps was 2.33 (SD = 0.35) in the Easy condition, 2.23 (SD = 0.27) in the Medium condition, and 2.34 (SD = 0. 47) in the Hard condition. In the Receiving steps, the mean words per second speaking rate was 2.4 (SD = 0.37) in the Easy condition, 2.44 (SD = 0.46) in the Medium condition, and 2.64 (SD = 0.36) in the Hard condition. There was more variation in words per second in the Receiving steps for the Easy and Medium conditions, but less in the Hard condition, as can be seen in Figure 5.1.   100    Mean words per second speaking rate was used as the dependent variable in a two-way ANOVA, with Condition (Easy, Medium, Hard) and Type (Giving, Receiving) as factors. There was a marginal effect of Type [F(1, 54) = 3.855, p = 0.0547], with mean words per second trending higher in the Receiving steps than in the Giving steps. No effect of Condition and no interaction between Condition and Type were found. This suggests that difficulty was not having an effect on talkers’ words per second speaking rate values, but that whether they were giving or receiving instructions did tend to have an effect on their speaking rate.  In terms of words per second articulation rate, the values were higher in the Receiving steps than in the Giving steps for the Medium and Hard conditions, but were roughly the same in both types of steps in the Easy condition. In the Giving steps, the words per second articulation rates were 3.41 (SD = 0.35) in the Easy condition, 3.46 (SD = 0.3) in the Medium condition, and 3.47 (SD = 0.35) in the Hard condition. In the Receiving steps, the articulation rates were 3.42 (SD = 0.32) in the Easy condition, 3.66 (SD = 0.34) in the Medium condition, and 3.78 (SD = Figure 5.1 Boxplot of words per second speaking rates by Condition and Type 101  0.33) in the Hard condition. The distributions of the words per second articulation rates are shown in Figure 5.2.   Mean words per second articulation rate was used as the dependent variable in a two-way ANOVA, with Condition (Easy, Medium, Hard) and Type (Giving, Receiving) as factors. There was a marginal effect of Type [F(1, 54) = 4.009, p = 0.0503], with mean words per second trending higher in the Receiving steps than in the Giving steps. No effect of Condition was found, nor was any interaction between Condition and Type. Again, as with speaking rate, this suggests that difficulty was not having an effect on talkers’ words per second articulation rate values, but that whether they were giving or receiving instructions did tend to have an effect on their articulation rates.  5.3.1.2 Syllables per second Mean syllables per second rates based on by-talker means were also examined by Condition and Type of step. Overall, as in the words per second analysis, the syllables per second speaking rates were higher in the Receiving steps than in the Giving steps in all Figure 5.2 Boxplot of words per second articulation rates by Condition and Type 102  conditions, with the highest rates found in the Hard condition. The mean syllables per second speaking rate in Giving steps was 2.74 (SD = 0.43) in the Easy condition, 2.66 (SD = 0.34) in the Medium condition, and 2.78 (SD = 0.55) in the Hard condition. In the Receiving steps, the mean syllables per second speaking rate was 3.11 (SD = 0.54) in the Easy condition, 3.21 (SD = 0.5) in the Medium condition, and 3.36 (SD = 0.4) in the Hard condition. The distribution of the rates in each condition is illustrated in Figure 5.3.    Mean syllables per second speaking rate was used as the dependent variable in a two-way ANOVA, with Condition (Easy, Medium, Hard) and Type (Giving, Receiving) as factors. An effect of Type was found [F(1, 54) = 18.443, p < 0.001]; mean syllables per second was higher in the Receiving steps than in the Giving steps. No effect of Condition and no interaction between Condition and Type was found. Again, as in the words per second speaking rate analysis, it was the talkers’ role at a given point in a conversation – i.e., whether they were giving or receiving instructions – which had an effect on their syllables per second speaking rates, not the difficulty of the task they were working on. Figure 5.3 Boxplot of syllables per second speaking rates by Condition and Type 103   In terms of syllables per second articulation rates, the values were again higher in the Receiving steps than in the Giving steps. In the Giving steps, the syllable per second articulation rates were 4.01 (SD = 0.46) in the Easy condition, 4.13 (SD = 0.41) in the Medium condition, and 4.13 (SD = 0.4) in the Hard condition. In the Receiving steps, the values were 4.43 (SD = 0.42) in the Easy condition, 4.84 (SD = 0.43) in the Medium condition, and 4.81 (SD = 0.32) in the Hard condition. The distribution of the syllables per second articulation rates is shown in Figure 5.4.   Mean syllables per second articulation rate was used as the dependent variable in a two-way ANOVA, with Condition (Easy, Medium, Hard) and Type (Giving, Receiving) as factors. An effect of Type was found [F(1, 54) = 33.147, p < 0.001]; mean syllables per second was higher in the Receiving steps than in the Giving steps. No effect of Condition and no interaction between Condition and Type were found. Once again, difficulty did not have an effect on talkers’ syllables per second articulation rates, while whether they were giving or receiving instructions did. Figure 5.4 Boxplot of syllables per second articulation rates by Condition and Type 104   Thus, there were no reliable effects of task difficulty on either speaking or articulation rate for either words per second or syllables per second values. However, there was an effect of talkers’ role in the task – whether they were giving or receiving instructions – which was statistically significant in the syllables per second analyses: talkers spoke more quickly in their Receiving steps than they did in their Giving steps.   5.3.2 Speaking and articulation rate differences between partners  This section presents the results of the analyses of the changes in absolute differences in dyad partners’ speech rates over time. In all the analyses, a reduced absolute difference over time would suggest that talkers were becoming more similar in their speech rates, which could be a result of convergence. Overall, very few consistent and reliable global patterns were found, although some dyads showed fairly consistent behaviour across the different measures.  5.3.2.1 Speaking and articulation rate differences in each step  Looking at differences in speech rates in each step – i.e., in which one talker was Giving instructions and the other was Receiving – the mean absolute differences in words per second speaking rate are given in Table 5.1. Overall, the difference between words per second speaking rates increased over time in the Easy and Hard conditions, suggesting that the dyads were becoming less similar in speaking rates, but decreased over time in the Medium condition, suggesting talkers’ speech rates were becoming more similar.  Table 5.1 Mean absolute words per second speaking rate differences by step in Early and Late portions of the conversations (standard deviations in parentheses)  Easy Medium Hard Early 0.46 (0.5) 0.7 (0.55) 0.47 (0.43) Late 0.55 (0.32) 0.54 (0.51) 0.54 (0.53) 105    As can be seen in Figure 5.5, there was a fair amount of variation in the mean Early and Late absolute words per second speaking rate differences within each condition. In the Easy condition, two dyads’ words per second differences decreased slightly over time (indicating that they were becoming more similar), two dyads’ differences increased somewhat (indicating that they were becoming less similar), and one dyad’s difference increased noticeably. In the Medium condition, four dyads’ words per second differences decreased over time to differing degrees, while one dyad’s difference increased. In the Hard condition, three dyads’ words per second differences decreased somewhat over time, while two dyads’ differences increased noticeably over time.     Figure 5.5 Absolute Early and Late words per second speaking rate differences in each dyad by Condition. Error bars indicate +/- 1 standard error of the mean. Higher values indicate larger absolute differences between talkers’ words per second speaking rates. 106   Inspection of a histogram of the absolute words per second speaking rate differences indicated that the data were not normally distributed; this was confirmed by a Shapiro-Wilk normality test (W=0.8717, p < 0.001). Absolute words per second difference was used as the dependent measure in a series of Wilcoxon rank sum tests for each condition, with Time (Early and Late) as the independent measure. No Bonferroni-corrected (p < 0.0167) significant differences between Early and Late times were found for the absolute words per second speaking rate difference in any of the three conditions (Easy: W = 137, p = 0.091; Medium: W = 237, p = 0.3273; Hard: W = 196, p = 0.9254). The difficulty of the task was thus not found to have an effect on the absolute differences between dyad partners’ words per second speaking rates over time.  Moving to the changes in similarity in talkers’ words per second articulation rates over time, the mean absolute differences are given in Table 5.2. The absolute differences in rates decreased over time in the Medium condition, suggesting that the talkers were becoming more similar in their words per second articulation rates over time; however, the rates increased slightly from Early to Late in the Hard condition, suggesting talkers were becoming less similar, and remained stable in the Easy condition.  Table 5.2 Mean absolute words per second articulation rate differences by step in Early and Late portions of the conversations (standard deviations in parentheses)  Easy Medium Hard Early 0.36 (0.26) 0.66 (0.45) 0.46 (0.38) Late 0.36 (0.38) 0.38 (0.25) 0.5 (0.55)   Figure 5.6 shows the by-dyad variation in the words per second articulation rate absolute differences in each condition. In the Easy condition, three of the five dyads showed decreases in 107  their absolute differences between the Early and Late portions of the conversations, suggesting their words per second articulation rates were becoming more similar. In the Medium condition, all dyads showed a decrease in absolute difference over time. In the Hard condition, on the other hand, three of the five dyads showed an increase in their absolute words per second articulation rate differences over time, suggesting that they were becoming less similar.    Inspection of a histogram of the absolute words per second articulation rate differences indicated that the data were not normally distributed; this was confirmed by a Shapiro-Wilk normality test (W=0.8474, p < 0.001). Absolute words per second articulation rate difference was used as the dependent measure in a series of Wilcoxon rank sum tests for each condition, with Time (Early and Late) as the independent measure. A trend towards a Bonferroni-corrected (p < 0.0167) significant differences was found between the Early and Late times in the Medium condition (W = 284, p = 0.0227), but not in the Easy (W = 220, p = 0.6017) or Hard (W = 199, p Figure 5.6 Absolute Early and Late words per second articulation rate differences in each dyad by Condition. Error bars indicate +/- 1 standard error of the mean. Higher values indicate larger absolute differences between talkers’ words per second speaking rates. 108  = 0.9893) conditions. The difficulty of the task was thus not found to have a reliable effect on the absolute differences between dyad partners’ words per second articulation rates over time, although there was a trend towards a difference in the Medium condition.  In terms of syllables per second speaking rates, the mean absolute differences are given in Table 5.3. As in the words per second speaking rate absolute difference analysis, the difference between talkers’ rates increased between the Early and Late periods of the conversations in the Easy and Hard conditions, suggesting that talkers were becoming less similar in syllables per second speaking rates, but decreased over time in the Medium condition, suggesting that they were becoming more similar.  Table 5.3 Mean absolute syllables per second speaking rate differences by step in Early and Late portions of the conversations (standard deviations in parentheses)  Easy Medium Hard Early 0.65 (0.64) 0.89 (0.65) 0.7 (0.51) Late 0.94 (0.59) 0.74 (0.61) 0.82 (0.68)   As can be seen in Figure 5.7, there was a fair amount of variability in the mean Early and Late absolute syllables per second speaking rate differences within each condition. In the Easy condition, three dyads’ absolute syllables per second differences increased somewhat over time, indicating that the talkers were becoming less similar in speaking rate, while two dyads’ differences stayed relatively stable. In the Medium condition, four dyads’ absolute syllables per second differences decreased by varying degrees over time, indicating that they were becoming more similar in speaking rate, while one dyad’s difference increased slightly over time. In the Hard condition, three dyads’ differences increased by varying degrees over time, while two dyads’ differences decreased.   109    Inspection of a histogram of the absolute syllables per second speaking rate differences indicated that the data were not normally distributed; this was confirmed by a Shapiro-Wilk normality test (W=0.8979, p < 0.001). Absolute syllables per second difference was used as the dependent measure in a series of Wilcoxon rank sum tests for each condition, with Time (Early and Late) as the independent measure. No Bonferroni-corrected (p < 0.0167) significant differences between Early and Late times were found for the absolute syllables per second speaking rate difference in any of the three conditions, although the Easy condition trended towards an increase in the absolute difference from Early to Late (Easy: W = 117, p = 0.02447; Medium: W = 226, p = 0.4945; Hard: W = 187, p = 0.7381). As in the words per second analysis, difficulty was not found to reliably affect the absolute difference between the dyad partners’ syllables per second speaking rate values.  Figure 5.7 Absolute Early and Late syllables per second speaking rate differences in each dyad by Condition. Error bars indicate +/- 1 standard error. Higher values indicate larger differences between dyad partners’ syllables per second speaking rates. 110   The absolute differences in syllables per second articulation rate in the Early and Late portions of the conversations in each condition are given in Table 5.4. The absolute rate differences decreased over time in the Medium and Hard conditions, suggesting that talkers’ articulation rates in those conditions were becoming more similar to their partners; however, they increased slightly in the Easy condition, suggesting that the talkers were becoming less similar.  Table 5.4 Mean absolute syllables per second articulation rate differences by step in Early and Late portions of the conversations (standard deviations in parentheses)  Condition  Time Easy Medium Hard Early 0.55 (0.32) 1.01 (0.66) 0.77 (0.52) Late 0.62 (0.46) 0.7 (0.5) 0.64 (0.6)   The by-dyad syllables per second articulation rate absolute differences in each condition are illustrated in Figure 5.8. In the Easy condition, two dyads’ differences decreased from the Early to Late portions of the conversations, indicating that they were becoming more similar in their syllables per second articulation rates; two dyads’ differences increased over time, indicating decreasing similarity in their articulation rates; and one dyad’s difference stayed stable. In the Medium condition, four of the five dyads’ syllables per second articulation rate differences decreased over time, indicating a general tendency in that condition for talkers to become more similar to their partners. In the Hard condition, three dyads’ articulation rate differences decreased over time, again suggesting that they were becoming more similar.   111    Inspection of a histogram of the absolute syllables per second articulation rate differences indicated that the data were not normally distributed; this was confirmed by a Shapiro-Wilk normality test (W=0.9179, p < 0.001). Absolute syllables per second articulation rate difference was used as the dependent measure in a series of Wilcoxon rank sum tests for each condition, with Time (Early and Late) as the independent measure. No Bonferroni-corrected (p < 0.0167) significant differences between Early and Late times were found for the absolute syllables per second articulation rate differences in any of the three conditions, (Easy: W = 193, p = 0.862; Medium: W = 255, p = 0.1417; Hard: W = 244, p = 0.2423). Difficulty was not found to affect the absolute difference between the dyad partners’ syllables per second articulation rate values.  Overall, task difficulty was not found to globally affect talkers’ speech rates when the absolute differences in each step were examined over time. There were some trends towards change – in the Medium condition in the words per second articulation rate analysis, and in the Figure 5.8 Absolute Early and Late syllables per second articulation rate differences in each dyad by Condition. Error bars indicate +/- 1 standard error. Higher values indicate larger differences between dyad partners’ syllables per second speaking rates 112  Easy condition in the syllables per second speaking rate analysis – but no systematic effects of difficulty were observed.  5.3.2.2 Speaking and articulation rate differences in Giving steps The mean absolute differences in words per second speaking rate in the Early and Late portions of the conversations in each condition for the Giving steps only are listed in Table 5.5. Overall, the absolute differences either decreased very slightly over time, suggesting that talkers’ words per second speaking rates were becoming somewhat more similar, or remained stable.  Table 5.5 Mean absolute words per second speaking rate differences by step in Early and Late portions of the conversations, Giving steps only (standard deviations in parentheses)  Condition Time Easy Medium Hard Early 0.48 (0.39) 0.44 (0.36) 0.44 (0.34) Late 0.38 (0.17) 0.37 (0.31) 0.44 (0.37)   As can be seen in Figure 5.9, there was a fair amount of by-dyad variation in the absolute words per second speaking time differences between the Early and Late Giving steps of the conversations in each condition. In the Easy condition, two dyads’ words per second speaking rate differences decreased over time, suggesting that they were becoming more similar in their speaking rates, while three dyads’ differences increased to varying degrees over time, suggesting that they were becoming less similar. In the Medium condition, two dyads’ differences decreased noticeably between the Early and Late portions of the conversation, one dyad’s difference decreased to a lesser degree, and two dyads’ differences increased. In the Hard condition, two dyads’ differences decreased over time, two dyads’ differences increased noticeably over time, and one dyad’s difference increased very slightly over time. 113    Inspection of a histogram of the words per second speaking rate absolute differences suggested that the data were not normally distributed; this was confirmed by a Shapiro-Wilk normality test (W = 0.9093, p < 0.001). Absolute words per second speaking rate difference was used as the dependent measure in a series of Wilcoxon rank sum tests for each condition, with Time (Early and Late) as the independent measure. No Bonferroni-corrected (p < 0.0167) significant differences between Early and Late times were found for the absolute words per second speaking rate difference in any of the three conditions (Easy: W = 52, p = 0.9118; Medium: W = 56, p = 0.6842; Hard: W = 51, p = 0.9705), indicating that difficulty was not globally affecting the absolute difference between the dyad partners’ words per second speaking rates over time in the Giving steps. The mean absolute words per second articulation rate differences in the Early and Late portions of the conversations in each condition are indicated in Table 5.6. The Medium condition Figure 5.9 Absolute Early and Late words per second speaking rate differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a larger absolute difference between dyad partners’ words per second speaking rates in the Giving steps. 114  showed a decrease in the articulation rate difference over time, suggesting that talkers in that condition were becoming more similar in their rates, while the difference increased slightly over time in the Easy and Hard conditions, suggesting that partners’ articulation rates might be becoming less similar.  Table 5.6 Mean absolute words per second articulation rate differences in Early and Late portions of the conversations, Giving steps only (standard deviations in parentheses)  Condition Time Easy Medium Hard Early 0.34 (0.14) 0.43 (0.2) 0.27 (0.21) Late 0.36 (0.19) 0.29 (0.16) 0.33 (0.22)  The by-dyad variation in the words per second articulation rate differences in each condition is shown in Figure 5.10. In the Easy condition, three of the five dyads showed an increase in their absolute articulation rate difference between the Early and Late portions of their conversations. In the Medium condition, all dyads showed a decrease in their absolute difference over time. In the Hard condition, two dyads showed a decrease in their differences, two showed an increase, and one remained stable.   115   Inspection of a histogram of the words per second articulation rate absolute differences suggested that the data were normally distributed; this was confirmed by a Shapiro-Wilk normality test (W = 0.9739, p = 0.2255). Absolute words per second speaking rate difference was used as the dependent measure in a repeated-measures ANOVA, with Condition (Easy, Medium, Hard) and Time (Early, Late) as factors. No effect of either Condition or Time was found, nor was any interaction between them, indicating that difficulty was not affecting the absolute difference between the dyad partners’ words per second articulation rates over time in the Giving steps. The mean absolute syllables per second speaking differences for the Giving steps in each Condition, Early and Late, are given in Table 5.7. As in the words per second speaking rate analysis, the absolute differences either decreased over time or remained relatively stable.  Figure 5.10 Absolute Early and Late words per second articulation rate differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a larger absolute difference between dyad partners’ words per second speaking rates in the Giving steps 116  Table 5.7 Mean absolute syllables per second speaking rate difference in Early and Late portions of the conversations, Giving steps only (standard deviations in parentheses)  Condition  Time Easy Medium Hard Early 0.57 (0.42) 0.7 (0.45) 0.5 (0.45) Late 0.51 (0.25) 0.49 (0.37) 0.52 (0.41)  As can be seen in Figure 5.11, there was a fair amount of by-dyad variation in the absolute syllables per second speaking rate differences between Early and Late times in the Giving steps in each condition. In the Easy condition, two dyads’ absolute syllables per second difference decreased over time, suggesting that their speaking rates were becoming more similar, while three dyads’ differences increased to some degree over time, suggesting that they were becoming less similar in speaking rates. In the Medium condition, only one dyad’s absolute syllables per second difference increased over time, while the other four dyads showed a decreased difference over time. In the Hard condition, three dyads’ differences decreased over time by varying degrees, and two dyads’ differences increased over time.   117   Inspection of a histogram of the absolute syllables per second speaking rate differences indicated that the data were not normally distributed; this was confirmed by a Shapiro-Wilk normality test (W = 0.9369, p = 0.004). Absolute syllables per second speaking rate difference was used as the dependent measure in a series of Wilcoxon rank sum tests for each condition, with Time (Early and Late) as the independent measure. No Bonferroni-corrected (p < 0.0167) significant differences between Early and Late times were found for the absolute syllables per second differences in any of the three conditions (Easy: W = 52, p = 0.9118; Medium: W = 63, p = 0.3527; Hard: W = 48, p = 0.9118), indicating that just as in the words per second speaking rate analysis, difficulty was not affecting the absolute difference between the dyad partners’ syllables per second speaking rate values in the Giving steps over time. Turning finally to syllables per second articulation rate absolute differences over time in the Giving steps, the means in each Condition and each portion of the conversation are given in Figure 5.11 Absolute Early and Late syllables per second speaking rate differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate larger absolute differences between the dyad partners’ syllables per second speaking rates in the Giving steps. 118  Table 5.8. The differences increased slightly over time in the Easy condition, decreased in the Medium condition, and remained stable in the Hard condition.  Table 5.8 Mean absolute syllables per second articulation rate difference in Early and Late portions of the conversations, Giving steps only (standard deviations in parentheses)  Condition  Time Easy Medium Hard Early 0.46 (0.25) 0.48 (0.23) 0.31 (0.3) Late 0.51 (0.23) 0.36 (0.16) 0.3 (0.24)  Looking at the by-dyad syllable articulation rate absolute differences over time in each condition, Figure 5.12 shows that in the Hard and Easy conditions, three and four of the five dyads’ absolute differences, respectively, increased between the Early and Late portions of the conversation, suggesting that they were becoming less similar in their articulation rates. In the Medium condition, on the other hand, four of the five dyads’ syllable articulation rate differences decreased over time, suggesting that their rates were becoming more similar. Figure 5.12 Absolute Early and Late syllables per second articulation rate differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate larger absolute differences between the dyad partners’ syllables per second articulation rates in the Giving steps. 119  Inspection of a histogram of the words per second articulation rate absolute differences suggested that the data were not normally distributed; a Shapiro-Wilk normality test indicated a trend towards a non-normal distribution (W = 0.9641, p = 0.07488). Absolute syllables per second articulation rate difference was used as the dependent measure in a series of Wilcoxon rank sum tests for each condition, with Time (Early and Late) as the independent measure. No Bonferroni-corrected (p < 0.0167) significant differences between Early and Late times were found for the absolute syllables per second differences in any of the three conditions (Easy: W = 44, p = 0.6842; Medium: W = 66, p = 0.2475; Hard: W = 46, p = 0.7959), indicating that just as in the words per second articulation rate analysis, difficulty was not affecting the absolute difference between the dyad partners’ syllables per second articulation rate values in the Giving steps over time.  Overall, task difficulty was not found to globally affect talkers’ speaking or articulation rates when the absolute differences in the Giving  steps were examined over time.   5.3.2.3 Speaking and articulation rate differences in Receiving steps The mean absolute words per second speaking rate differences for the Receiving steps, Early and Late, are given in Table 5.9. While the Easy and Hard conditions showed a slight increase in the absolute difference in words per second rates over time, suggesting the talkers might be becoming less similar in their speaking rates between the Early and Late portions of the conversation, the Medium condition showed a large decrease over time, suggesting the talkers’ speaking rates were becoming more similar over the course of the conversation.   120  Table 5.9 Mean absolute words per second speaking rate differences in Early and Late portions of the conversation, Receiving steps only (standard deviations in parentheses)  As can be seen in Figure 5.13, there was a fair amount of by-dyad variation in the absolute words per second speaking rate differences between Early and Late times in the Receiving steps in each condition. In the Easy condition, three dyads’ absolute differences increased over time, indicating that their speaking rates were becoming less similar. In the Medium condition, all dyads’ differences decreased over time to varying degrees, indicating that their speech rates were becoming more similar. In the Hard condition, three dyads’ differences decreased over time, and two dyads’ differences increased.  Condition Time Easy Medium Hard Early 0.56 (0.62) 1.04 (0.57) 0.53 (0.38) Late 0.64 (0.41) 0.57 (0.51) 0.6 (0.59) 121   Inspection of a histogram of the absolute words per second speaking rate differences indicated that the data were not normally distributed; this was confirmed by a Shapiro-Wilk normality test (W = 0.9057, p < 0.001). Absolute words per second speaking rate difference was used as the dependent measure in a series of Wilcoxon rank sum tests for each condition, with Time (Early and Late) as the independent measure. No Bonferroni-corrected (p < 0.0167) significant differences between Early and Late times were found for the absolute words per second difference in any of the three conditions (Easy: W = 38, p = 0.393; Medium: W = 74, p = 0.0753; Hard: W = 52, p = 0.9118), indicating that difficulty was not affecting the dyad partners’ absolute differences in words per second speaking rates in the Receiving steps over time. For the words per second articulation rate absolute differences, as shown in Table 5.10, the Easy and Hard conditions again showed a slight increase in the absolute difference in words Figure 5.13 Absolute Early and Late words per second speaking rate differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate larger absolute differences between the dyad partners’ words per second speaking rates in the Receiving steps. 122  per second rates over time, suggesting the talkers might be becoming less similar in their articulation rates, while the Medium condition showed a large decrease over time, suggesting the talkers’ articulation rates were becoming more similar over the course of the conversation.  Table 5.10 Mean absolute words per second articulation rate differences in Early and Late portions of the conversation, Receiving steps only (standard deviations in parentheses)  In terms of by-dyad absolute difference changes over time, in the Easy condition, four of the five dyads showed a decrease in the difference in their words per second articulation rates over time, suggesting they were becoming more similar; the one dyad which showed an increase showed a dramatic one (over a word per second), as can be seen in Figure 5.14. In the Medium condition all five dyads showed a decrease in their articulation rate difference between the Early and Late portions of the conversation. However, in the Hard condition, three of the five dyads showed an increase in their absolute articulation rate difference over time, suggesting that they were becoming less similar.    Condition Time Easy Medium Hard Early 0.47 (0.41) 1.09 (0.4) 0.47 (0.35) Late 0.48 (0.42) 0.32 (0.35) 0.52 (0.46) 123   Inspection of a histogram of the absolute words per second articulation rate differences indicated that the data were not normally distributed; this was confirmed by a Shapiro-Wilk normality test (W = 0.9194, p < 0.001). Absolute words per second articulation rate difference was used as the dependent measure in a series of Wilcoxon rank sum tests for each condition, with Time (Early and Late) as the independent measure. A significant difference at a Bonferroni-corrected α-level of 0.0167 was found between the words per second articulation rate differences in the Early and Late portions of the Medium condition (W = 93, p < 0.001). However, no significant differences between Early and Late times were found in the Easy and Hard conditions (Easy: W = 52, p = 0.9118; Hard: W = 48, p = 0.9118), suggesting that difficulty as a whole was not affecting the dyad partners’ absolute differences in words per second articulation rates in the Receiving steps over time in a reliable way. Figure 5.14 Absolute Early and Late words per second articulation rate differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate larger absolute differences between the dyad partners’ words per second articulation rates in the Receiving steps 124  The mean absolute syllables per second speaking rate differences for the Receiving steps, Early and Late, are given in Table 5.11. Here, as in the words per second speaking rate analysis, the differences increased over time in the Easy and Hard conditions, but decreased over time in the Medium condition.  Table 5.11 Mean absolute syllables per second speaking rate differences in Early and Late portions of the conversation, Receiving steps only (standard deviations in parentheses)  Condition  Time Easy Medium Hard Early 0.67 (0.61) 0.996 (0.76) 0.58 (0.43) Late 0.95 (0.6) 0.69 (0.49) 0.88 (0.81)   As can be seen in Figure 5.15, there was a fair amount of by-dyad variation in the absolute syllables per second speaking rate differences in each condition. In the Easy condition, four of the five dyads showed an increase in their absolute differences between the Early and Late portions of their conversations, suggesting that the partners were becoming less similar in their speaking rates. In the Medium condition, three of the five dyads showed decreases in their absolute difference over time, suggesting that the talkers were becoming more similar in their speaking rates. In the Hard condition, three dyads showed increases in their absolute syllables per second speaking rate differences over time, while two dyads showed decreases.    125     Inspection of a histogram of the absolute syllables per second speaking rate differences indicated that the data were not normally distributed; this was confirmed by a Shapiro-Wilk normality test (W = 0.922, p < 0.001). Absolute syllables per second speaking rate difference was used as the dependent measure in a series of Wilcoxon rank sum tests for each condition, with Time (Early and Late) as the independent measure. No Bonferroni-corrected (p < 0.0167) significant differences between Early and Late times were found for the absolute syllables per second speaking rate differences in any of the three conditions (Easy: W = 35, p = 0.2799; Medium: W = 62, p = 0.393; Hard: W = 43, p = 0.6305). Once again, as in the words per second speaking rate analysis, difficulty was not affecting the absolute differences between the dyad partners’ syllables per second speaking rate values in the Receiving steps over time. Figure 5.15 Absolute Early and Late syllables per second speaking rate differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate larger absolute differences between the dyad partners’ syllables per second speaking rate values in the Receiving steps. 126   Turning finally to the absolute differences in syllables per second articulation rates, the mean values in the Early and Late parts of the conversation in each condition are given in Table 5.12. There was a large decrease in difference in the Medium condition over time, suggesting that those talkers were becoming more similar to their partners in syllable articulation rate over time, and a very small decrease in the Easy condition. However, there was an increase in the Hard condition over time, suggesting those talkers may have been becoming less similar to their partners in articulation rate.  Table 5.12 Mean absolute syllables per second articulation rate differences in Early and Late portions of the conversation, Receiving steps only (standard deviations in parentheses)  Condition  Time Easy Medium Hard Early 0.58 (0.35) 1.07 (0.79) 0.55 (0.3) Late 0.51 (0.49) 0.5 (0.47) 0.67 (0.48)   By dyad, as can be seen in Figure 5.16, three of the five dyads in the Easy condition showed a decrease in their absolute difference in syllable per second articulation rate over time, again suggesting that they were becoming more similar in their articulation rates between the Early and Late portions of the conversation. In the Medium condition, four dyads’ absolute differences decreased over time. In the Hard condition, on the other hand, three dyads’ absolute syllable articulation rate differences increased over time, suggesting the partners were becoming less similar in their rates.   127  Figure 5.16 Absolute Early and Late syllables per second articulation rate differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate larger absolute differences between the dyad partners’ syllables per second articulation rates in the Receiving steps.   Inspection of a histogram of the absolute syllables per second speaking rate differences indicated that the data were not normally distributed; this was confirmed by a Shapiro-Wilk normality test (W = 0.8699, p < 0.001). Absolute syllables per second articulation rate difference was used as the dependent measure in a series of Wilcoxon rank sum tests for each condition, with Time (Early and Late) as the independent measure. No Bonferroni-corrected (p < 0.0167) significant differences between Early and Late times were found for the absolute syllables per second speaking rate differences in any of the three conditions (Easy: W = 61, p = 0.4359; Medium: W = 76, p = 0.05243; Hard: W = 47, p = 0.8543). Difficulty was not affecting the absolute differences between the dyad partners’ syllables per second articulation rates in the Receiving steps over time.  Overall, task difficulty was not found to globally affect talkers’ speaking or articulation rates when the absolute differences in the Receiving steps were examined over time.  128   5.3.3 Time-series cross-correlation of speaking and articulation rates  Following Pardo et al. (2010, 2013a), the words per second and syllables per second speaking and articulation rates for the talkers in each dyad were submitted to time-series cross-correlation analysis. Separate analyses were conducted for  each step (i.e., in which one talker was Giving instructions and the other was Receiving), for Giving steps only, and for Receiving steps only.  5.3.3.1 Words per second speaking and articulation rate cross-correlation  Table 5.13 presents the results of the cross-correlation analyses for the words per second speaking rate values for each dyad. In the Easy condition, only three of the five dyads showed a significant (p < .05) coefficient in any of the analyses; this included a significant negative coefficient for E3 in the Giving steps analysis, which Pardo et al. (2010, 2013a) would take to indicate divergence in speech rates between the talkers, and significant negative coefficients for E4 in the ‘each step’ analysis and the Receiving steps analysis. In the Medium condition, again, only three of the five dyads showed any significant (p < .05) results, including significant negative coefficients for M1 in the Giving steps analysis, and for M3 and M5 in the ‘each step’ analysis. In the Hard condition, once again, only three of the five dyads showed at least one significant (p < .05) coefficient, including a significant negative coefficient for H4 in the ‘each step’ analysis.   129  Table 5.13 Time-series cross-correlation results for words per second speaking rates. Note: Indicated coefficients are significant at p < .05. ‘--’ indicates no significant coefficients. Number in parentheses indicates steps of lag in each analysis.  Dyad Each Step Giving Steps Receiving Steps E1 -- (19) -- (13) -- (13) E2 -- (19) 0.744 at lag -1 (13) -- (13) E3 -- (19) -0.753 at lag 0 (13) -- (13) E4 -0.585 at lag 5 (19) -- (13) -0.690 at lag 2 (13) E5 -- (19) -- (13) -- (13) M1 -- (15) -0.827 at lag -3 (9) -- (9) M2 -- (15) -- (9) -- (9) M3 -0.562 at lag 6 (15) -- (9) -- (9) M4 -- (15) -- (9) -- (9) M5 -0.597 at lag 1 (15) -- (9) -- (9) H1 -- (13) -- (7) -- (7) H2 -- (13) -- (7) -- (7) H3 0.641 at lag -1 (13) -- (7) -- (7) H4 -0.685 at lag 0 (13) -- (7) -- (7) H5 0.628 at lag -1 (13) -- (7) -- (7)   Table 5.14 shows the cross-correlation results for the words per second articulation rates. In the Easy condition, only two significant (p <.05) coefficients were found in total, both of which were positive: one in the Receiving steps for dyad E2, and one in the Giving steps for dyad E5. In the Medium condition, only three significant coefficients were found: one for dyad M4 in the Giving steps, and one each in the ‘each step’ analysis and the Receiving steps analysis for dyad M5, both of which were negative. All dyads in the Hard condition had at least one significant coefficient in their ‘each step’ analysis; dyads H2 and H3 each had both one positive coefficient and one negative coefficient in those analyses. Dyad H1 also had one significant negative coefficient in the Receiving steps analysis.   130  Table 5.14 Time-series cross-correlation results for words per second articulation rates. Note: Indicated coefficients are significant at p < .05. ‘--’ indicates no significant coefficients. Number in parentheses indicates steps of lag in each analysis. Dyad Each Step Giving Steps Receiving Steps E1 -- (19) -- (13) -- (13) E2 -- (19) -- (13) 0.713 at lag -3 (13) E3 -- (19) -- (13) -- (13) E4 -- (19) -- (13) -- (13) E5 -- (19) 0.836 at lag -1 (13) -- (13) M1 -- (15) -- (9) -- (9) M2 -- (15) -- (9) -- (9) M3 -- (15) -- (9) -- (9) M4 -- (15) 0.808 at lag -1 (9) -- (9) M5 -0.802 at lag 7 (15) -- (9) -0.876 at lag -3 (9) H1 -0.626 at lag 1 (13) -- (7) -0.920 at lag 0 (7) H2 -0.653 at lag 0 and 0.698 at lag 1 (13) -- (7) -- (7) H3 -0.619 at lag -4 and 0.745 at lag -3 (13) -- (7) -- (7) H4 0.688 at lag 1 (13) -- (7) -- (7) H5 0.615 at lag 1 (13) -- (7) -- (7)   The pattern in the ‘each steps’ analysis for dyads H2 and H3 – i.e., where there is a negative coefficient at an even-numbered lag and a positive coefficient at an odd-numbered lag – may reflect the differences in articulation rates in the Giving and Receiving steps discussed in section 5.3.1.1 above. The even intervals of lag are those in which when the first talker’s Giving steps correlated with the second talker’s Receiving steps, and vice versa; recall that the mean words per second articulation rate for the Receiving steps was noticeably higher than that for the Giving steps (3.78 for Receiving, 3.47 for Giving). At those intervals, then, one talker’s high Receiving articulation rates were paired with the other’s lower Giving articulation rates, likely leading to a negative correlation. The odd intervals are those in which the talkers’ lower-articulation-rate Giving steps were correlated with each other and in which their higher-articulation-rate Receiving steps were correlated, which would lead to a positive correlation. 131  However, other than this, no systematic patterns of correlation were found for the words per second speaking rate or articulation rate analyses.  5.3.3.2 Syllables per second speaking and articulation rate cross-correlation  Table 5.15 presents the results of the cross-correlation analyses for the syllables per second speaking rates for each dyad. In the Easy condition, four of the five dyads showed at least one significant (p < .05) coefficient, including a significant negative coefficient (-0.779 at lag 0) for dyad E3 in the Giving steps analysis. In the Medium condition, again, four of the five dyads showed at least one significant (p < .05) coefficient; almost all of these are significant negative coefficients, which according to Pardo’s (2010, 2013a) analysis would indicate divergence in rates between talkers. Dyad M3 showed five significant coefficients in the ‘each step’ analysis, which followed the pattern of alternating negative and positive coefficients seen in Dyads H2 and H3 in their words per second articulation rate ‘each step’ analyses. Four of the five dyads also showed significant (p < .05) coefficients in the Hard condition; one of these was a significant negative coefficient for dyad H2 in the Receiving steps analysis.  Table 5.15 Time-series cross-correlation results for syllables per second speaking rates. Note: Indicated coefficients are significant at p < .05. ‘--’ indicates no significant coefficients. Number in parentheses indicates steps of lag in each analysis. Dyad Each Step Giving Steps Receiving Steps E1 -- (19) -- (13) -- (13) E2 -- (19) 0.766 at lag -1 (13) -- (13) E3 -- (19) -0.779 at lag 0 (13) -- (13) E4 0.546 at lag 1 (19) -- (13) -- (13) E5 0.560 at lag 5 (19) 0.673 at lag -1 (13) -- (13) M1 -- (15) -0.838 at lag -3 (9) -0.823 at lag -1 (9) M2 -0.647 at lag 2 (15) -- (9) -- (9) M3 -0.692 at lag 0, 0.579 at lag 1, -0.562 at lag 2, 0.570 at lag 5, and  -- (9) -- (9) 132  Dyad Each Step Giving Steps Receiving Steps -0.585 at lag 6 (15) M4 -- (15) -- (9) -- (9) M5 -0.577 at lag 1 (15) -- (9) -- (9) H1 0.647 at lag 5 (13) -- (7) -- (7) H2 -- (13) -- (7) -0.897 at lag 1 (7) H3 0.704 at lag -1 (13) -- (7) -- (7) H4 -- (13) -- (7) -- (7) H5 0.628 at lag -1 (13) -- (7) -- (7)   Table 5.16 shows the cross-correlation results for the syllables per second articulation rates. In the Easy condition, four of the five dyads showed at least one significant (p < .05) coefficient in the analyses; most of these were negative coefficients, which Pardo et al. (2010, 2013a) would take to indicate talkers were diverging in speech rate. In the Medium condition, dyads M2 and M3 showed the same alternating negative and positive significant coefficient pattern in their ‘each step’ analysis which was found in previous analyses. M5 also showed two significant negative coefficients, one in the ‘each step’ analysis and one in the Receiving steps analysis. In the Hard condition, dyads H2 and H3 again showed the alternating negative/positive coefficient pattern in their ‘each step’ analysis. Dyad H5 showed one significant positive coefficient in the ‘each step’ analysis, and dyad H4 showed one significant positive coefficient in the Giving steps analysis. Table 5.16 Time-series cross-correlation results for syllables per second articulation rates. Note: Indicated coefficients are significant at p < .05. ‘--’ indicates no significant coefficients. Number in parentheses indicates steps of lag in each analysis. Dyad Each Step Giving Steps Receiving Steps E1 -0.680 at lag 0 (19) -- (13) -- (13) E2 -- (19) -0.658 at lag -3 (13) -- (13) E3 -- (19) -- (13) -- (13) E4 -0.555 at lag -2 and 0.495 at lag 5 (19) -- (13) -- (13) E5 0.488 at lag -3, -0.526 at lag 2, and -0.577 at 0.690 at lag -1 (13) -- (13) 133  Dyad Each Step Giving Steps Receiving Steps lag 4(19) M1 -- (15) -- (9) -- (9) M2 0.586 at lag -1, -0.797 at lag 0, 0.728 at lag 1, and -0.612 at lag 2 (15) -- (9) 0.848 at lag 0 and  -0.848 at lag 2 (9) M3 -0.847 at lag 0, 0.603 at lag 1, -0.644 at lag 2, and 0.721 at lag 3  (15) -- (9) -- (9) M4 -- (15) -- (9) -- (9) M5 -0.661 at lag 7 (15) -- (9) -0.803 at lag -3 (9) H1 -- (13) -- (7) -- (7) H2 -0.730 at lag -2, 0.644 at lag 1, -0.808 at lag 0, and 0.710 at lag 1 (13) -- (7)  (7) H3 0.754 at lag -3, -0.673 at lag -2, 0.769 at lag -1, -0.815 at lag 0, and 0.657 at lag 1 (13) 0.911 at lag -2 (7) -- (7) H4 -- (13) 0.877 at lag 1 (7) -- (7) H5 0.641 at lag 1 (13) -- (7) -- (7)   As in the words per second analyses, other than the alternating negative/positive pattern due to the Giving/Receiving asymmetry in the ‘each step’ analysis, no systematic patterns of correlations were found in either speaking rate or articulation rate for the syllables per second measures.  5.4 Discussion  This chapter built on the previous findings that increased cognitive workload can lead to increased speech rate (Griffin & Williams, 1987; Lively et al., 1993; Brenner et al., 1994; Scherer et al., 2002) and that talkers will sometimes, but not always, converge on the speech rate of a task partner, interviewer, or model talker (Street Jr., 1982; Jungers & Hupp, 2009; Pardo et al., 2010, 2013a; Levitan & Hirschberg, 2011). Looking at both articulation rate – i.e., only using 134  the time that the talkers required to articulate words and syllables, and not including pauses, coughs, laughter, etc. – and speaking rate – using all the time that talkers spent ‘holding the floor’ in a step – the analyses in this chapter explored whether the talkers in the construction task described in Chapter 2 (a) would show an increased speaking and/or articulation rate in the more difficult conditions, (b) would become more similar over time in their speaking and or/articulation rates, as shown by a decrease in the difference in their rates over the course of the conversation and/or by a correlation in their rates over time, and (c) if they became more similar in speaking and/or articulation rates, whether those changes in similarity were affected by the difficulty of their task.   In regards to task difficulty, the results showed that the talkers in the more difficult conditions – Medium and Hard – did not have a significantly higher speech rate than those in the Easy condition. The average overall words per second and syllables per second speaking and articulation rates were generally higher in the Hard condition than in the other two conditions, but there were also more outliers in that condition. It is possible that a larger sample size could increase the reliability of these patterns.  While there was no clear difference in speech rate between the conditions, there was a difference based on what role the talkers were playing in the task in a given step: talkers had higher rates in steps in which they were receiving instructions than in those in which they were giving instructions, particularly in terms of syllables per second speaking and articulation rates. In those Receiving steps, the highest speech rates were again found in the Hard condition than in the other conditions, although the difference was not significant. The reasons posited for speech rate increase by Brenner et al. (1994) and Lively et al. (1993) do not fully explain this finding. In the case of Brenner et al. (1994), who proposed that the increase in speech rate is related to 135  physiological changes (e.g., increased heart rate) brought on by psychological stress due to increased workload, it is likely that Receivers experienced some increase in cognitive workload when trying to correctly interpret and follow Givers’ instructions. However, it would seem that ensuring one gives accurate instructions could also be a cause of increased cognitive workload, so it is not clear why Givers would not also show an increased speech rate. In the case of Lively et al.’s (1993) explanation using Lindblom’s (1990) ‘hyperspeech and hypospeech’ model of phonetic variation, if Receivers are trying to return their attention to the task as quickly as possible, then a speech rate increase due to the influence of hyperspeech again seems reasonable. However, Givers should also be engaging in hyperspeech, as they should be enhancing the intelligibility of their utterances to maximize the chances that the Receivers will understand them; this would suggest that they should also show a speech rate increase. It is also possible that both Givers and Receivers might maximize their intelligibility by producing hyperarticulated vowels, which could actually lead to a slower speech rate. For the moment, the reason for the Giving/Receiving speech rate differences will remain a question for further study.  In terms of talkers’ changes in speech rate similarity over time, which could indicate speech convergence if increased similarity is regularly occurring, almost no significant results were found when partners’ similarity in speaking and articulation rates were examined in terms of task difficulty, using both absolute difference measures and time-series cross-correlation analysis, and no clear patterns were found across the conditions. In the analysis of changes in absolute speech rate differences, the most consistent patterns were found in the Medium condition. In all 12 measures (words per second and syllables per second speaking and articulation rates in all steps, Giving steps only, and Receiving steps only), the mean absolute difference decreased over time; however, only one of those decreases – in the word per second 136  articulation rate in the Receiving steps – was statistically significant. In 10 of those 12 measures, at least four of the five dyads – including M2, whose absolute differences decreased over time in all measures – showed an absolute difference decrease over time. In the Hard condition, the mean absolute difference increased over time in nine of the 12 measures, albeit never in a statistically significant way; there was no clear pattern when the individual dyads were examined in each measure. In the Easy condition, six measures showed an increase in mean absolute rate difference, with the others showing a mix of decreases and relative stability; no significant differences were found in any of the measures. Again, the lack of significant differences may be due to the small sample size in this experiment; with more dyads in each condition, it is possible that the patterns seen in the Medium and Hard conditions could strengthen.  This lack of a systematic pattern of increasing or decreasing similarity over time does echo the findings of both Pardo et al. (2010, 2013a) and Levitan and Hirschberg (2011). Some dyads seemed to consistently become more similar in speech rate, regardless of the measurement used (dyads M1 and M2 in the Medium condition, dyad H1 in the Hard condition), while others consistently became less similar in speech rate (dyads E1 and E2 in the Easy condition, dyads H3 and H4 in the Hard condition); the rest showed convergence or divergence depending on the measurement used. As Hecker et al. (1968) suggest, it may be the case that not all talkers display the same kinds of speech production changes when cognitive workload increases, but that talkers are very internally consistent in how their behaviour changes. It may also be useful to explore other factors which might explain the dyads’ speech rate patterns. For example, Scherer et al. (2002) found that the talkers in their study who self-reported higher levels of stress displayed a higher speech rate. This suggests that perhaps talkers with a higher Neuroticism score in the Big 5 personality traits (John et al., 1991, 2008; Benet-Martinez & John, 1998), who would be 137  predicted to have higher levels of stress in difficulty situations, would display higher speech rates in high cognitive workload conditions than those with lower Neuroticism scores. Alternately, a talker with a high Extraversion score – for which one of the self-report indicators is “I see myself as someone who is talkative” (John et al., 2008, p. 157) – might be expected to display a higher speech rate in most conditions than someone with a lower Extraversion score, which might then affect their possible speech rate changes under higher task difficulty conditions; if someone has an already high speech rate, how much higher could they conceivably go in a high-workload situation? This exploration will be left for further research.  In the time-series cross-correlation analyses, again, no clear patterns of convergence or divergence were found; per Pardo et al.’s (2010, 2013a) analysis, positive correlations would indicate speech convergence, while negative correlations would indicate speech divergence. In the words per second speaking rate and articulation rate measures, six of the 15 dyads showed no significant correlations in any of the three analyses: ‘each step’, Giving steps only, or Receiving steps only. For the syllables per second speaking rate measures, three dyads showed no significant correlations, while four dyads did not show any significant correlations for their syllables per second articulation rate measures. Most correlations which were found were in the ‘each steps’ analysis – that is, when one talker’s Giving steps were correlated with their partner’s Receiving steps – with a combination of negative and positive coefficients being observed, and sometimes an alternation between negative and positive coefficients being seen in adjacent steps of lag in a single dyad’s time-series correlations (e.g., dyads H2 and H3 in the articulation rate measures). The negative coefficients in the ‘each step’ analyses seem to be at least partly an effect of the differences in speaking and articulation rates in the Giving and Receiving steps; as mean speech rates were higher in the Receiving than in the Giving steps, pairings of Receiving 138  and Giving rates would be likely to show negative correlations, as the high Receiving rates would consistently be paired over time with the low Giving rates. It is thus not necessarily the case that negative coefficients reflect divergence on the part of the talkers, as Pardo et al. (2010, 2013a) suggest; in many of the cases in the ‘each step’ analysis, it may simply be a result of the underlying differences in speech rates between Giving and Receiving steps. It is also the case that a negative coefficient could indicate that talkers are converging, in the case where one talker’s speech rate decreases systematically over time while the other’s increases. By comparison, pairings of low-speech-rate Giving steps over time would tend to be positively correlated, as would pairings of high-speech-rate Receiving steps. As well, most of the significant correlations in the ‘each step’ analysis were found in the Medium and Hard conditions, for which the differences between the Giving and Receiving speech rates tended to be largest. Because of the asymmetry in speech rates, patterns of correlation in the Giving steps only and Receiving steps only analyses could have been more useful in assessing talkers’ potential convergence or divergence: however, only a few significant correlations were found in any of the measures, and no consistent patterns were observed. Time-series cross-correlation of dyad partners’ speech rates, then, displayed the same lack of systematicity as did the absolute difference analyses; again, it is possible that a larger sample size would lead to more reliable patterns, but it may also be simply the case that task difficulty does not affect whether talkers become more similar to their partners in speech rate over time.  In the next chapter, we will turn from an analysis of speech rate to an analysis of pause rate and pause percentage, and whether task difficulty has an effect on convergence in that domain. Given that talkers’ silent pausing behaviour is one of the elements that comprises their speaking rate (see Grosjean & Lane, 1976, and Miller & Grosjean, 1981), Chapter 6 is in some 139  ways an extension of the current chapter. Chapter 6 also includes an analysis of filled pausing behaviour, which has been widely discussed as a potential indicator of task difficulty (e.g., Schachter et al., 1991; Oviatt, 1995; Bortfeld et al., 2001; Clark & Fox Tree, 2002). 140  Chapter 6: Task difficulty and pausing convergence 6.1 Introduction  Just as speaking rate has been explored in relation to both speech convergence and the effect of task difficulty on speech production, so have talkers’ pausing patterns, both for silent pauses and for filled pauses (i.e., utterances such as ‘uh’, ‘um’, and ‘mm’). Silences are an intrinsic part of speech, as they are necessary results of breathing and stop consonant articulation, and are one of the components which make up listeners’ perceptions of talkers’ speech rates (e.g., Grosjean & Lane, 1976; Miller & Grosjean, 1981). Given the role of silence in speech rate, it is possible that a talker who converges to an interlocutor on speech rate may do so through modifying their use of silent pauses instead of modifying the rate at which they produce words. Increased numbers or durations of silent pauses are often taken to be indicative of difficulties in speech production, including difficulties in areas which are often associated with increased cognitive workload such as utterance planning and word retrieval (e.g., Henderson et al., 1966; Butterworth, 1975; Brennan & Williams, 1995; Schilperoord, 2002).  The use of filled pauses is likewise often associated with difficulties such as planning utterances and making lexical choices (e.g., Schachter et al., 1991; Oviatt, 1995; Bortfeld et al., 2001; Clark & Fox Tree, 2002). Producing longer and more complex sentences (Oviatt, 1995; Bortfeld et al., 2001) has been found to lead to a greater use of filled pauses than producing shorter sentences. Similarly, having a greater range of options in speech planning – such as a larger vocabulary from which to select words (Schachter et al., 1991) or more choices in how to complete an oral task (Christenfeld, 1994) – leads to greater filled pause use as well. The choice of filled pauses rather than silent ones may be a signal to listeners that a talker is having difficulty, and may indicate a talker’s desire to ‘hold the floor’ despite those difficulties (e.g., 141  Schachter et al, 1991; Brennan & Williams, 1995; Bortfeld et al., 2001; Brennan & Schober, 2001; Clark & Fox Tree, 2002). Indeed, Clark and Fox Tree (2002) propose that ‘uh’ and ‘um’ are used to signal different kinds of delays in speaking (‘uh’ for minor delays, ‘um’ for major delays); however, this proposal has been questioned by O’Connell and Kowal (2005), who found that filled pauses were not regularly followed by any speech delay. An added factor in studying the use of filled pauses is the potential influence of sociolinguistic variation. It has been found that older speakers use more filled pauses than younger speakers (Bortfeld et al., 2001; Tottie, 2011; Laserna et al., 2014), that men use more filled pauses than women (Bortfeld et al., 2001; Tottie, 2011), and that, at least in the United Kingdom, people in jobs with a higher socioeconomic status use more filled pauses than those with lower status jobs (Tottie, 2011). It does not appear to be the case that the groups which use more filled pauses are necessarily using longer sentences than the groups using fewer filled pauses. For example, in Bortfeld et al. (2001), older talkers (mean age 67 years) had a higher rate of filled pauses per 100 words than did middle-aged talkers (mean age 48 years), despite the two groups producing approximately the same amount of speech overall; the same was true for men and women in that study. There also appear to be sociolinguistic differences in use between ‘uh’ and ‘um’, where women, young people, and people of higher socioeconomic status use ‘um’ more than do men, older people, and people of lower socioeconomic status (Liberman, 2005, 2014; Tottie, 2011). As well, there are potential audience design (Bell, 1984) effects of using filled pauses; that is, talkers may adapt their speech based on how they think listeners will view them before they receive any input from the listeners. Talkers who use filled pauses while speaking in public are often viewed as being nervous or poorly prepared (Christenfeld, 1995; Erard, 2004), even though it has been found that filled pause use does not increase with anxiety (Kasl & Mahl, 1965) and that listeners who pay 142  attention to talkers’ speech content rather than style do not notice how many filled pauses a talker uses (Christenfeld, 1995). Thus, the use of filled pauses is an area of speech that seems to be affected by both cognitive workload and the types of social factors – both talker-internal and listener-oriented – that often lead to speech accommodation (Giles, 1973; Giles et al., 1973; Giles & Coupland, 1991; Giles et al., 1991; Giles & Ogay, 2007).  The effect of workload-induced task difficulty on pauses in speech production has been examined in Khawaja (2010) and Khawaja et al. (2008), which explored the possibility of using pausing behaviour changes as an indicator of excess cognitive workload in human-computer interactions. In these studies, participants completed both a low-load task and a high-load task. In the low-load task, participants were asked to read short passages aloud and then answer comprehension questions on those passages, also aloud. In the high-load task, participants were asked to also monitor a series of numbers being presented aurally over headphones while they were reading the passages and answering the comprehension questions. Khawaja et al. (2008) and Khawaja (2010) both compared how much time talkers spent pausing as a percentage of their overall speech time, including both passage-reading speech and question-answering speech, in the high-load and low-load conditions. They also compared these measures for filled pauses only and for silent pauses only. They found that, while no effects were found for filled pauses, both the overall percentage of time pausing and the percentage of time spent on silent pauses were significantly higher in the high-load condition than in the low-load condition. Khawaja (2010) additionally looked at pause length (mean filled pause length and mean silent pause length) and frequency of pauses (overall, for filled pauses, and for silent pauses). It was found that both average silent pause length and average filled pause length were significantly higher in the high-workload condition than in the low-workload condition. No effects of workload were 143  found on the pause frequency measures. This work thus reinforces the psycholinguistic findings (in e.g., Schachter et al., 1991; Oviatt, 1995; Bortfeld et al., 2001) which suggest that talkers pause more when the conditions under which they are speaking become more difficult, whether due to internal (e.g., planning) or external (e.g., attentional) factors.  In terms of convergence, pausing behaviour is one of many areas of speech in that talkers could converge on, and one that is perhaps better explored in conversational contexts than in word list-based auditory naming contexts (Goldinger, 1998, inter alia) due to the need for speech material surrounding the pauses. Differing results have been found when convergence has been explored in silent pauses and in filled pauses. In terms of silent pauses, as part of a wide-ranging study of non-lexical convergence in conversation, Bilous and Krauss (1988) examined the frequency of short (less than one second) and long silent pauses in interactions between same-gender dyads and between mixed-gender dyads. Unlike other convergence studies (e.g., Kim et al., 2011; Kim, 2012; Pardo, 2006), this one did not examine patterns of convergence in a single interaction over time; rather, talkers’ differences in convergence between interactions was measured. They found that males used more silent pauses, both short and long, than females in same-gender dyads, and that both females and males converged on silent pause frequency in mixed-gender dyads, with females increasing and males decreasing their short and long pause rates. Males showed a greater tendency towards convergence than did females.  In terms of filled pauses, a study by Pardo et al. (2013a) found that talkers in a task-based dyadic interaction in fact diverged from their partners in their filled pausing rates. This study used the Map Task (Anderson et al., 1991), with one partner giving instructions and the other receiving, and had the talkers switch roles throughout the interaction – i.e., a talker gave instructions for one map, received instructions on the next, and so on. Pardo et al. (2013a) 144  predicted that Givers would show higher filled pause rates than Receivers, due both to a desire to hold the floor while giving instructions and to difficulties related to utterance planning in the instruction-giving task. However, it was found that Givers and Receivers used filled pauses at the same rate when in their original roles, but when they switched roles, the Original Givers/New Receivers increased their pause rate, and consequently used more filled pauses than did the Original Receivers/New Givers (who correspondingly decreased their filled pause rate). Thus, in the ‘switched’ tasks, the New Receivers used more filled pauses than the New Givers did. This does not seem to follow from the proposals discussed above (e.g., Schacter et al., 1991; Oviatt, 1995; Bortfeld et al., 2001), where filled pause use is related to speech production difficulties such as utterance planning and word retrieval, and in particular in from Bortfeld et al.’s (2001) findings that directors (the equivalent of Givers) in a matching task had more filled pauses than matchers did. Pardo et al. (2013a) nevertheless suggest that their filled pause rate findings could still be due to task difficulty: in particular, that Givers-turned-Receivers have a higher filled pause rate because they are trying to “hold the conversational floor when they [meet] difficulty resulting from having less task-related information” (2013a, p. 287) than they did previously. These observations reinforce the suggestion that task difficulty could have an effect on how talkers use filled pauses, and also suggest that their role in a task might have an effect on that use.  Given that task difficulty has been shown to have an effect on pausing behaviour, and given that talkers have displayed a variety of convergence and divergence patterns when it comes to pausing, this chapter explores the question of whether task difficulty has an effect on talkers’ tendency to pause and to converge in their pausing patterns, by examining whether the talkers in the construction task corpus 145  (1) showed different filled and silent pause rates and percentages depending on the difficulty condition they were in. (2) became more similar in pause rates and percentages over the course of their interaction, as indicated by decreasing absolute differences in dyads’ pausing rates and percentages over time. Increased similarity would suggest that the talkers were converging in their pausing behaviour.  (3) showed different patterns of changes in similarity depending on the task difficulty condition they were in.  Section 6.2 describes the methods used in this examination; Section 6.3 presents the results; Section 6.4 discusses the results.  6.2 Methods  As was the case with speech rate, a number of measures have been used to examine pause rate. In terms of filled pauses, Pardo et al. (2013a) used filled pause rate (filled pauses/second) as their measure, following Bortfeld et al. (2001). In terms of silent pauses, Bilous and Krauss (1988) used frequency of silent pauses as their measure, separating out short pauses (one second or less) and long pauses (more than one second). However, there is no indication of the unit of time used – i.e., whether it was the number of pauses during the conversation that was counted, or the number of pauses per minute of conversation, or something else. Khawaja (2010) used several measures for both filled and silent pauses: length of pauses, percentage of pausing time in total speaking time, and pause frequency in each 30-second interval of speech (normalized pause frequency). 146   In the analysis described here, pause rate – number of pauses per second – and pause percentage – the amount of time taken up by pauses in the total turn time in each step – were selected as the measurements. Both measurements were included because talkers with similar pausing rates could have very different pausing percentages, and vice versa. Consider, for example, two of the talkers in the Easy condition, who had the steps with the 11th and 12th lowest silent pause rates out of the 400 total steps in the corpus. The average silent pause rate for the entire corpus was 0.39 pauses per second (SD = 0.13); that is, not taking into account of the length of the pauses, there was somewhat more than one-third of a pause occurring every second. Talker 111 in dyad E3 had a silent pause rate in her eighth receiving step of 0.13 pauses per second. Talker 125 in dyad E4 had a silent pause rate in her fifth receiving step of 0.14 pauses per second. However, while their silent pause rates were similar, their silent pause percentages were not. The average silent pause percentage for the whole corpus was 19.9% (SD = 7.9%); that is, the talkers were typically not producing speech for nearly one-fifth of the time in which they held the floor. For talker 125, who had three silent pauses totalling 1.054 seconds in a step that was 22 seconds long, the resulting silent pause percentage was 4.8%; this was the eighth lowest silent pausing percentage in the corpus. On the other hand, Talker 111 only had one silent pause, which was 0.967 seconds long, in a step which was 7.6 seconds long. This meant that 12.7% of the step was spent on silent pausing, which was the 68th lowest silent pausing percentage out of the 400 steps in the corpus. Because there is no hard link between pause rate and pause percentage, talkers could conceivably converge with their partners on either measure, on both, or on neither; thus, both measures were included for both filled and silent pauses.  Silent and filled pauses were identified using the lists of words and non-word elements generated for each step, as described in section 5.2.1. Filled pauses were primarily transcribed 147  using ‘uh’ and ‘um’, although a few transcriptions of ‘ah’, ‘er’, and ‘mm’ were also used based on talkers’ individual pronunciations.15 As in the analysis of speech rate, all inter-turn pauses were removed to calculate the total turn time for each step for each talker. For the silent pauses, following Pardo et al. (2010, 2013a), only those silences and breaths which were longer than 250 ms were counted as pauses; silences and breaths 250 ms in length or shorter were taken to be part of the articulation process (see the discussion in Grosjean & Lane, 1976, and Miller & Grosjean, 1981). Noises and coughs were not included in the silent pause rate (silent pauses per second) and silent pause percentage calculations. All filled pauses were included in the filled pause rate (filled pauses per second) and filled pause percentage calculations. Unlike in the speech rate analysis, rates and percentages were only calculated in terms of total turn time. As in the speech rate analysis described in Chapter 5, mean pause rate and pause percentage for each talker were used to determine whether there were differences in pause rate and/or pause percentage (both silent and filled) between the three conditions. Then, the absolute differences between pause rate values and between pause percentage values for dyad partners early and late in the conversation were measured; if the talkers were becoming more similar – which would suggest that convergence was occurring – these differences should be smaller later in the conversation than they were earlier in the conversation. These absolute differences were measured in three ways:                                                   15 On the rare occasions when there was a question as to whether a listed item was a filled pause or some other part of speech – e.g., something transcribed as ‘ah’ or ‘mm’ could indicate a filled pause, a moment of insight (e.g., ‘Ah! Yes, I see what you’re saying!’), or a backchannel (e.g., ‘Do you understand?’ ‘Mm.’) – the audio file was checked for disambiguation based on surrounding lexical context and on intonation. Filled pauses were found to typically have an extended level intonation, sometimes with a terminal rise; non-pauses typically either displayed terminal intonation or were part of the broader intonation pattern of a clause. 148   in each step; i.e., between one talker’s Giving pause rate/percentage and the other talker’s Receiving pause rate/percentage for Step One, Step Two, etc. This is similar to the method used in Pardo et al. (2013a), in which Givers’ filled pause rates were compared to those of Receivers.  between the pause rates/percentage in the Giving steps for each partner. In this instance, the talkers’ Giving steps were paired in order of occurrence. Thus, the first pairing would have Talker 1’s Giving pause rate/percentage in Step 1 (their first Giving step) and Talker 2’s Giving pause rate/percentage in Step 2 (their first Giving step). The second pairing would have Talker 1’s Giving pause rate/percentage in Step 3 (their second Giving step) and Talker 2’s Giving pause rate/percentage in Step 4 (their second Giving step).   between the pause rates/percentages in the Receiving steps for each partner, in the same way as for the Giving steps.   These differences were measured for each step in the first instance, for each Giving step pairing in the second instance, and for each Receiving step pairing in the third instance. However, as each condition had a different number of steps – i.e., 18 (nine Giving and nine Receiving for each partner) in the Easy condition, 12 (six Giving and six Receiving for each partner) in the Medium condition, and 10 (five Giving and five Receiving for each partner) in the Hard condition – not all of the steps were used in the analysis. In the by-step pause rate/percentage difference analysis, the first four steps (Early) and the last four steps (Late) for each dyad were used; in the Giving and Receiving difference analyses, the first two same-type pairings (Early) and the last two same-type pairings (Late) for each dyad were used.  149  6.3 Results 6.3.1 Overall pausing differences  To determine if task difficulty had an effect on participants’ pause rates and pause percentages, the overall pause rate and pause percentage in each condition was measured for both silent and filled pauses.  6.3.1.1 Silent pauses  Silent pause rate was examined using by-talker means and including all steps. One talker in the Medium condition had no silent pauses in one of her Receiving steps, and one talker in the Easy condition had no silent pauses in three of her Receiving steps. Overall, silent pause rate was higher in the Giving steps than in the Receiving steps. Mean silent pause per second values in the Giving steps were 0.41 (SD = 0.05) in the Easy condition, 0.45 (SD = 0.06) in the Medium condition, and 0.44 (SD = 0.07) in the Hard condition. In the Receiving steps, mean silent pause per second values were 0.33 (SD = 0.08) in the Easy condition, 0.39 (SD = 0.14) in the Medium condition, and 0.36 (SD = 0.11) in the Hard condition. Figure 6.1 illustrates the distributions of silent pause per second values in Giving and Receiving steps in each condition. 150      Mean silent pauses per second was used as the dependent measure in a two-way ANOVA, with Condition (Easy, Medium, Hard) and Type (Giving, Receiving) as factors. There was a main effect of Type [F(1, 54) = 11.195, p = 0.0015], but no effect of Condition and no interaction between Condition and Type. This suggests that difficulty was not having an effect on talkers’ silent pausing rates, but that whether they were giving or receiving instructions did tend to have an effect on their pause rate.  Silent pauses as a percentage of speaking time were also examined, as talkers could have a low silent pause rate (few pauses) but a high silent pause percentage (longer pauses). The by-talker mean silent pause percentage values were again higher in the Giving steps than in the Receiving steps. Mean silent pause percentage in the Giving steps was 21.1% (SD = 3.6%) in the Easy condition, 23.9% (SD = 4.2%) in the Medium condition, and 23.1% (SD = 4.8%) in the Hard condition. In the Receiving steps, mean silent pause percentage was 16% (SD = 4.1%) in the Easy condition, 19.5% (SD = 8.7%) in the Medium condition, and 17.4% (SD = 7.2%) in the Figure 6.1 Boxplot of silent pause per second values by Condition and Type 151  Hard condition. Figure 6.2 illustrates the distribution of silent pause percentage values in Giving and Receiving steps in each condition.   Mean silent pause percentage was used as the dependent measure in a two-way ANOVA, with Condition (Easy, Medium, Hard) and Type (Giving, Receiving) as factors. There was an effect of Type [F(1, 54) = 11.584, p = 0.00126], but no effect of Condition and no interaction between Condition and Type; again, this suggests that difficulty did not play a role in how much talkers used silent pauses, but whether they were giving or receiving instructions did.  6.3.1.2 Filled pauses  Filled pause rate was also examined using by-talker means and including all steps. Twenty-four of the 30 participants had no filled pauses in a least one of their Receiving steps; this included nine of ten talkers in the Easy condition, eight of ten in the Medium condition, and seven of ten in the Hard condition. In addition, three of those participants – one in the Easy condition, and two in the Medium condition – had no filled pauses in at least one of their Giving Figure 6.2 Boxplot of silent pause percentages by Condition and Type 152  steps. As in the silent pause rate analysis, mean filled pause per second values were higher in the Giving steps than in the Receiving steps. The mean by-talker filled pauses per second rate in the Giving steps was 0.059 (SD = 0.017) in the Easy condition, 0.076 (SD = 0.023) in the Medium condition, and 0.076 (SD = 0.032) in the Hard condition. In the Receiving steps, the mean filled pause per second value was 0.028 (SD = 0.015) in the Easy condition, 0.04 (SD = 0.021) in the Medium condition, and 0.065 (SD = 0.042) in the Hard condition. The distribution of filled pause per second values across conditions and types is illustrated in Figure 6.3.   Mean filled pause per second rate was used as the dependent variable in a two-way ANOVA with Condition (Easy, Medium, Hard) and Type (Giving, Receiving) as factors. There was an effect of Condition [F(2, 54) = 4.928, p = 0.011] and an effect of Type [F(1, 54) = 14.113, p < 0.001], but no interaction between Condition and Type. The effect of Condition was then explored through paired t-tests: the difference in filled pause rates was significant at a Bonferroni-corrected α-level of 0.0167 between the Easy and Hard conditions [t(19) = -2.8404, p = 0.0105], and there was a trend towards a difference between the Easy and Medium conditions Figure 6.3 Boxplot of filled pause per second values by Condition and Type 153  [t(19) = -2.4042, p = 0.0266]; however, there was no difference between the Medium and Hard conditions [t(19) = -1.554, p = 0.137]. Thus, the difficulty of the task did have an effect on talkers’ filled pause rates; talkers in the Easy condition used fewer filled pauses per second than did those in the Hard condition, and tended to use fewer filled pauses per second than those in the Medium condition. The talkers’ role also affected their filled pause rates, as was the case for silent pause rates and percentages.  Filled pauses as a percentage of speaking time were also examined, as talkers could have a low filled pause rate (few pauses) but a high filled pause percentage (longer pauses). As in the filled pause rate analysis, mean by-talker mean filled pause percentage was higher in the Giving steps than in the Receiving steps. In the Giving steps, the mean filled pause percentage was 2.9% (SD = 1%) in the Easy condition, 3.7% (SD = 1.1%) in the Medium condition, and 3.6% (SD = 1.4%) in the Hard condition. In the Receiving steps, the mean filled pause percentage was 1.4% (SD = 0.9%) in the Easy condition, 1.9% (SD = 1.2%) in the Medium condition, and 2.6% (SD = 1.7%) in the Hard condition. The distribution of filled pause percentage values across conditions and types of steps is illustrated in Figure 6.4. 154    Mean filled pause percentage was used as the dependent measure in a two-way ANOVA, with Condition and Type as factors. There was an effect of Type [F(1, 54) = 19.704, p < 0.001], but there was no effect of Condition and no interaction between Condition and Type. Thus, only whether talkers were giving or receiving instructions affected filled pause percentages; the difficulty of the condition they were in did not.  6.3.2 Pausing differences between partners  Pausing differences between partners were examined for silent pause rate and percentage and for filled pause rate and percentage to determine if their pausing behaviour was becoming more similar, and to determine if the task difficulty condition was having an effect on this. An increased absolute difference over time in a given condition would suggest that most dyads’ pausing behaviour was becoming less similar, while a decreased absolute difference would suggest an increase in pausing behaviour similarity. In addition to measuring the differences in each step, the differences in Giving steps only and in Receiving steps only were also measured. Figure 6.4 Boxplot of filled pause percentage by Condition and Type 155  Giving and Receiving were measured separately in light of the speaking time differences found in the two types of steps, as discussed in Chapter 2.  6.3.2.1 Pausing differences in each step  Looking at differences in silent pause rate in each step – i.e., in which one talker was Giving instructions and the other was Receiving – the mean absolute differences in silent pauses per second are given in Table 6.1. Overall, the differences increased over time in the Easy and Hard conditions, suggesting partners were becoming less similar in their silent pause rates, but decreased in the Medium condition, suggesting partners were becoming more similar in their silent pause rates.  Table 6.1 Mean absolute silent pause per second differences by step (standard deviations in parentheses)  Easy Medium Hard Early 0.148 (0.106) 0.159 (0.132) 0.092 (0.078) Late 0.183 (0.107) 0.124 (0.089) 0.102 (0.074)    As can be seen in Figure 6.5, there was a fair amount of variation in the mean Early and Late absolute silent pause per second differences within each condition. In the Easy condition, three dyads’ silent pause per second differences increased over time to varying degrees, suggesting their silent pause rates were becoming less similar, while two dyads’ differences decreased, suggesting their silent pause rates were becoming more similar. The Medium condition showed a more consistent pattern, with four dyads’ silent pause per second differences decreasing by differing degrees over time, and only one dyad’s difference increasing. In the Hard 156  condition, two dyads’ silent pause per second differences increased noticeably over time, one dyad’s difference increased somewhat, and two dyads’ differences decreased over time.   As the silent pause per second absolute difference values were not normally distributed, a series of Wilcoxon rank sum tests was used to examine the differences between mean Early and Late absolute differences in each condition. No Bonferroni-corrected (α = 0.0167) differences were found (Easy: W = 158, p = 0.2648; Medium: W = 223, p = 0.5468; Hard: W = 177, p = 0.5468), suggesting that task difficulty was not affecting whether dyads became more similar in their silent pause rates.  In terms of silent pause percentage, the mean absolute differences are given in Table 6.2. As in the silent pause rate absolute differences, the difference increased over time in the Easy and Hard conditions, suggesting partners’ silent pause percentages were becoming less similar Figure 6.5 Absolute Early and Late silent pause per second differences in each dyad by Condition.  Error bars indicate +/- 1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ silent pause per second rates. 157  over time, but decreased in the Medium condition, suggesting the percentages were becoming more similar over time.  Table 6.2 Mean absolute silent pause percentage differences by step (standard deviations in parentheses)  Easy Medium Hard Early 9.02 (6) 10.42 (9.44) 7.15 (5.01) Late 11.08 (7.68) 9.37 (6.35) 7.43 (6.14)  As can be seen in Figure 6.6, there was a fair amount of variation in the mean Early and Late absolute silent pause percentage differences within each condition. In the Easy condition, two dyads’ silent pause percentage differences increased over time (suggesting decreasing similarity), two dyads’ differences decreased (suggesting increasing similarity), and one dyad stayed relatively stable. In the Medium condition, three dyads’ silent pause percentage differences decreased by differing degrees over time, and two dyads’ differences increased. In the Hard condition, the opposite pattern was found: three dyads’ silent pause percentage differences increased over time, while two dyads’ differences decreased. 158    As the silent pause percentage absolute difference values were not normally distributed, a series of Wilcoxon rank sum tests was used to examine the differences between mean Early and Late absolute differences in each condition. No Bonferroni-corrected (α = 0.0167) differences were found (Easy: W = 170, p = 0.4291; Medium: W = 201, p = 0.9893; Hard: W = 196, p = 0.9254). This suggests that task difficulty was not affecting the similarity over time of dyads’ silent pause percentages.  Turning now to filled pause rate, the mean absolute differences in filled pauses per second are given in Table 6.3. Again, the difference increased over time in the Easy and Hard conditions, but remained relatively stable in the Medium condition.   Figure 6.6 Absolute Early and Late silent pause percentage differences in each dyad by Condition. Error bars indicate +/- 1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ silent pause values.  159  Table 6.3 Mean absolute filled pause per second differences by step (standard deviations in parentheses)  Easy Medium Hard Early 0.048 (0.032) 0.052 (0.038) 0.053 (0.059) Late 0.062 (0.044) 0.051 (0.033) 0.058 (0.062)   Figure 6.7 illustrates the patterns of  filled pauses per second absolute difference changes between dyads. In the Easy condition, four of the five dyads’ differences followed the overall pattern of the condition and increased over time, suggesting talkers’ filled pause rates were decreasing in similarity. In the Medium condition, two dyads’ filled pause per second differences decreased over time, suggesting increasing similarity, while three dyads’ differences increased by varying degrees over time, suggesting decreasing similarity. The dyads in the Hard condition showed the same pattern as those in the Medium condition, with one of the increasing dyads and one of the decreasing dyads showing large filled pause per second difference changes from Early to Late in the conversation.  Figure 6.7 Absolute Early and Late filled pause per second differences in each dyad by Condition. Error bars indicate +/- 1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ filled pause per second values. 160   The filled pause per second absolute difference values were not normally distributed, so a series of Wilcoxon rank sum tests was used to examine the differences between mean Early and Late absolute differences in each condition. No Bonferroni-corrected (α = 0.0167) differences in filled pauses per second absolute differences were found (Easy: W = 170, p = 0.4291; Medium: W = 194, p = 0.8831; Hard: W = 180, p = 0.6017).  Filled pause percentage difference followed the same pattern as seen in the analyses of silent pause rate and percentage, as shown in Table 6.4. The absolute differences in mean filled pause percentage increased over time in the Easy and Hard conditions, but decreased over time in the Medium condition.  Table 6.4 Mean absolute filled pause percentage differences by step (standard deviations in parentheses)  Easy Medium Hard Early 2.04 (1.5) 2.68 (2.08) 2.09 (1.12) Late 2.8 (2.01) 2.47 (1.64) 3.01 (2.36)   Within each condition, three of the five dyads showed an increase in filled pause percentage absolute difference over time, suggesting a decrease in similarity in filled pausing behaviour. These trends are illustrated in Figure 6.8.   161    Wilcoxon rank sum tests on the mean Early and Late absolute differences in filled pause percentage in each condition found no difference between the values over time (Easy: W = 160, p = 0.2888; Medium: W = 206, p = 0.8831; Hard: W = 157, p = 0.2534), suggesting task difficulty was not affecting whether partners’ filled pause percentages became more similar.  Overall, there was no evidence that task difficulty was affecting whether dyads became more similar in their pausing behaviour over the course of their conversation when one partner’s pausing behaviour while giving instructions was compared with the other partner’s pausing behaviour when receiving instructions.  6.3.2.2 Pausing differences in Giving steps  Turning now to the changes in silent pause rate in the Giving steps only, the mean silent pause per second absolute differences in the Early and Late conversational periods in each Figure 6.8 Absolute Early and Late filled pause percentage differences in each dyad by Condition. Error bars indicate +/- 1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ filled pause percentage values. 162  condition are given in Table 6.5. In all three conditions, the mean difference decreased over time – most noticeably in the Easy condition – suggesting the talkers were becoming more similar in their silent pausing rates.  Table 6.5 Mean absolute silent pause per second differences, Giving steps only (standard deviations in parentheses)  Easy Medium Hard Early 0.132 (0.103) 0.083 (0.063) 0.081 (0.077) Late 0.072 (0.059) 0.075 (0.087) 0.077 (0.071)   Looking in more detail in each condition, as illustrated in Figure 6.9, four of the five dyads in the Easy condition showed a decrease in absolute silent pause rate difference over time. The same pattern emerged in the Medium condition; the one dyad which showed an increase in absolute silent pause per second difference over time showed a fairly large increase. On the other hand, in the Hard condition, three dyads’ differences increased to varying degrees over time.  163    As the absolute silent pause rate differences in the Giving steps were not normally distributed, a series of Wilcoxon rank sum tests was used to examine the differences between mean Early and Late absolute differences in each condition. No Bonferroni-corrected (α = 0.0167) differences in absolute silent pause per second difference were found (Easy: W = 66, p = 0.2475; Medium: W = 61, p = 0.4359; Hard: W = 52, p = 0.9118), suggesting that task difficulty was not having an effect on talkers’ silent pause rate similarity over time in the Giving steps.  The mean absolute silent pause percentage differences in the early and late portions of each condition in the Giving steps are given in Table 6.6. As in the silent pause rate analysis, absolute silent pause percentage values decreased over time in all conditions, suggesting the partners’ silent pause percentages were becoming more similar over time.  Figure 6.9 Absolute Early and Late silent pause per second differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ silent pause rates. 164  Table 6.6 Mean absolute silent pause percentage differences, Giving steps only (standard deviations in parentheses)  Easy Medium Hard Early 5.86 (7.85) 5.23 (3.65) 5.09 (3.95) Late 5.59 (3.11) 4.34 (3.04) 3.01 (2.36)   Examining the differences in each condition in more detail, in the Easy condition, four of the five dyads showed an increase in absolute silent pause difference over time, while the one dyad which showed a decrease showed a large one. In the Medium condition, three of the dyads showed a decrease in silent pause percentage absolute difference. In the Hard condition, two dyads showed an increase in absolute silent pause percentage difference over time, while three showed decreases of varying degrees. These changes are illustrated in Figure 6.10.     As the absolute silent pause percentage differences in the Giving steps were not normally distributed, a series of Wilcoxon rank sum tests was used to examine the differences between Figure 6.10 Absolute Early and Late silent pause percentage differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ silent pause percentage values. 165  mean Early and Late absolute differences in each condition. No Bonferroni-corrected (α = 0.0167) differences in silent pause percentage absolute difference were found (Easy: W = 33, p = 0.2176; Medium: W = 56, p = 0.6842; Hard: W = 41, p = 0.5288), suggesting that task difficulty was not having an effect on dyads’ similarity in silent pause percentage in the Giving steps.  Turning now to filled pause rate, as can be seen in Table 6.7, the mean filled pause per second absolute difference decreased over time in the Giving steps in the Easy and Hard conditions, but remained relatively stable in the Medium condition.  Table 6.7 Mean absolute filled pause rate differences, Giving steps only (standard deviations in parentheses)  Easy Medium Hard Early 0.037 (0.018) 0.032 (0.025) 0.047 (0.027) Late 0.021 (0.013) 0.033 (0.03) 0.041 (0.032)   Looking at the filled pause per second absolute difference changes over time by dyad in Figure 6.11, in the Easy condition, four of the five dyads showed a decrease in absolute difference over time, suggesting increased filled pause rate similarity, with the one remaining dyad increasing very slightly. In the Medium condition, three dyads showed some amount of increase in difference over time in filled pauses per second (suggesting the talkers were becoming less similar), while two dyads showed a decrease. In the Hard condition, three dyads showed a decrease in filled pause rate absolute difference over time, and two showed an increase. 166    Wilcoxon rank sum tests on the Early and Late mean filled pause per second absolute differences in the Giving steps in each condition were used, as the values were not normally distributed. The results showed a trend towards a significant difference in the Easy condition (W = 80, p = 0.02323 at a Bonferroni-corrected α-level of 0.0167), but no differences in the Medium (W = 51, p = 0.9705) and Hard (W = 56, p = 0.6842) conditions.  The absolute difference in mean filled pause percentage in the Giving steps increased over time in the Easy, as shown in Table 6.8; in contrast, the Hard mean filled pause percentage absolute difference decreased from Early to Late portions of the conversation. The Medium condition stayed relatively stable.  Table 6.8 Mean absolute filled pause percentage differences, Giving steps only (standard deviations in parentheses)  Easy Medium Hard Early 1.39 (1.03) 1.78 (1.26) 2.12 (1.06) Late 1.52 (1.29) 1.8 (1.41) 2 (1.67) Figure 6.11 Absolute Early and Late filled pause per second differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ filled pause per second values. 167    The individual dyads’ filled pause percentage difference changes in the Easy condition did not reflect the overall increase over time, as illustrated in Figure 6.12: three of the five dyads’ filled pause percentage absolute differences decreased over time, but of the two dyads whose difference increased, one did so quite noticeably. In the Medium condition, again, three of the five dyads showed a decrease in filled pause percentage absolute difference over time to varying degrees. The Hard condition also saw three of the five dyads decrease their filled pause percentage absolute difference over time, suggesting that the talkers were becoming more similar in filled pause percentage.     A series of Wilcoxon rank sum tests on the mean Early and Late filled pause percentage absolute differences in the Giving steps in each condition showed no Bonferroni-corrected differences (Easy: W = 48, p = 0.9118; Medium: W = 51, p = 0.9705; Hard: W = 57, p = 0.6305), suggesting Figure 6.12 Absolute early and late filled pause percentage differences in each dyad, Giving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ filled pause percentage values. 168  that task difficulty did not have an effect on dyads’ similarity over time in filled pause percentage.  In sum, there was no clear evidence that task difficulty was reliably affecting whether dyads became more similar in their pausing behaviour over the course of their conversation when comparing partners’ giving steps.  6.3.2.3 Pausing differences in Receiving steps  Turning now to the pausing patterns in the Receiving steps only, the changes in mean absolute silent pause rate difference are given in Table 6.9. The mean silent pause per second absolute difference increased in the Easy condition, decreased in the Medium condition, and stayed relatively stable in the Hard condition.  Table 6.9 Mean absolute silent pause per second differences, Receiving steps only (standard deviations in parentheses)  Easy Medium Hard Early 0.122 (0.112) 0.209 (0.161) 0.109 (0.101) Late 0.175 (0.141) 0.165 (0.13) 0.107 (0.108)   Looking at the changes in silent pause rate absolute difference over time for each dyad, three of the dyads in the Easy condition showed increases in silent pause per second absolute difference between Early and Late portions of their conversations, suggesting that the talkers’ silent pause rate was becoming less similar. In the Medium condition, conversely, three of the dyads’ silent pause per second differences decreased over time, suggesting their silent pause rates were coming more similar. The Hard condition showed none of the dramatic changes of the other two conditions, with three dyads’ absolute differences decreasing over time. These results are illustrated in Figure 6.13. 169    Wilcoxon rank sum tests on the Early and Late mean silent pause rate absolute differences for the Receiving steps in each condition showed that none of the changes over time were significant (Easy: W = 39, p = 0.4359; Medium: W = 54, p = 0.7959;  Hard: W = 49, p = 0.9705). This suggests that task difficulty did not have an effect on talkers’ silent pause rate similarity over time in the Receiving steps.  In terms of silent pause percentage, the mean absolute silent pause percentage differences increased over time in the Easy and Hard conditions, suggesting the talkers becoming less similar to their partners, and decreased in the Medium condition, suggesting the talkers were becoming more similar to their partners. These results are shown in Table 6.10.  Figure 6.13 Absolute Early and Late silent pause per second differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ silent pause per second values 170  Table 6.10 Mean absolute silent pause percentage differences, Receiving steps only (standard deviations in parentheses)  Easy Medium Hard Early 7.16 (6.06) 13.33 (11.17) 6.56 (6.36) Late 11.29 (6.86) 8.85 (8.57) 8.43 (6.02)   Examining the trends within each condition, as illustrated in Figure 6.14, three of the five dyads in each of the Easy and Hard conditions showed an increase in absolute silent pause percentage difference over time. In the Medium condition, on the other hand, four of the five dyads showed a decrease in the absolute silent pause percentage difference between the Early and Late portions of the conversation, suggesting that they were becoming more similar in this measure.   Wilcoxon rank sum tests on the mean silent pause percentage absolute differences between the Early and Late portions of the conversations in the Receiving steps in each condition Figure 6.14 Absolute Early and Late silent pause percentage differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ silent pause percentage values. 171  again showed no significant differences (Early: W = 31, p = 0.1655; Medium: W = 63, p = 0.3527; Hard: W = 44, p = 0.6842). Again, this suggests that task difficulty was not affecting dyads’ similarity over time in silent pauses as a percentage of speaking time.  Looking now at filled pauses, the mean filled pause per second absolute differences in the Early and Late portions of each condition are given in Table 6.11. The absolute filled pause rate difference decreased in the Easy condition, increased slightly in the Hard condition, and remained relatively stable in the Medium condition.  Table 6.11 Mean absolute filled pause per second differences, Receiving steps only (standard deviations in parentheses)  Easy Medium Hard Early 0.037 (0.041) 0.052 (0.042) 0.07 (0.11) Late 0.03 (0.037) 0.051 (0.039) 0.073 (0.09)   As illustrated in Figure 6.15, the changes in filled pause per second absolute difference by dyad were quite small in the Easy and Medium conditions. In both conditions, two dyads showed an increase over time, while three showed a decrease. In the Hard condition, on the other hand, three dyads showed an increase in difference over time while two showed a decrease; one of the increasing dyads and one of the decreasing dyads showed quite large changes in filled pause rate absolute difference over time. 172    A series of Wilcoxon rank sum tests was used to examine the differences between mean Early and Late filled pause per second absolute differences in the Receiving steps in each condition. No Bonferroni-corrected (α = 0.0167) differences were found (Easy: W = 58, p = 0.5655; Medium: W = 53, p = 0.8499; Hard: W = 43, p = 0.6305); as in the silent pause rate analysis, task difficulty does not appear to have had an effect on dyads’ filled pause rate similarity.  In terms of the filled pause percentage absolute differences in the Receiving steps, as can be seen in Table 6.12, the mean absolute filled pause percentage difference increased over time in both the Medium and Hard conditions, suggesting that the talkers were becoming less similar to their partners, but decreased over time in the Easy condition, suggesting that talkers were becoming more similar to their partners.  Figure 6.15 Absolute early and late filled pause per second differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ filled pause rates. 173  Table 6.12 Mean absolute filled pause percentage differences, Receiving steps only (standard deviations in parentheses)  Easy Medium Hard Early 1.64 (1.79) 2.64 (2.75) 2.62 (1.13) Late 1.38 (1.84) 2.76 (2.31) 2.92 (3.48)   In the Easy condition, three of the dyads showed decreases of varying degrees in their filled pause percentage difference over time, as shown in Figure 6.16; this suggests those dyads were becoming more similar in this area. In the Medium condition, two dyads showed increases in their filled pause percentage differences, one showed a noticeable decrease, and two stayed relatively stable over time. In the Hard condition, three dyads showed a decrease over time, while the other two showed increases.    Wilcoxon rank sum tests on the mean filled pause percentage absolute differences in the Early and Late periods in each condition showed no significant differences (Easy: W = 59, p = 0.5149; Medium: W = 47, p = 0.8499; Hard: W = 62, p = 0.393). This indicates that in the Figure 6.16 Absolute Early and Late filled pause percentage differences in each dyad, Receiving steps only, by Condition. Error bars indicate +/-1 standard error of the mean. Higher values indicate a greater absolute difference between the dyad partners’ filled pause percentages. 174  Receiving steps, task difficulty was not having an effect on dyads’ similarity over time in filled pause percentage.  Once again, there was no evidence that task difficulty was consistently affecting whether dyads became more similar in their pausing behaviour over the course of their conversation when comparing partners’ receiving steps.  6.4 Discussion  Building on the findings that increased cognitive workload can lead to increased time spent pausing, whether silently or using fillers such as ‘uh’ and ‘um’ (Bortfeld et al., 2001; Khawaja, 2010; Khawaja et al., 2008) and that talkers will sometimes, but not always, converge on the pausing patterns of a task partner (Bilous & Krauss, 1988; Pardo et al., 2013a), this chapter explored whether the talkers in the construction task described in Chapter 2 (a) would show an increased pause rate or pause percentage in the more difficult conditions, (b) would become more similar to their partners in pause rate or percentage, as shown by a decrease in the difference in their pause rates/percentages over the course of the conversation, and (c) if they became more similar in pause rates or percentages, whether that was affected by the difficulty of their task.  Increased similarity could suggest that talkers were converging to their partners over time, while decreased similarity could suggest that they were diverging.  In regards to differences in pause use between conditions, the results showed only one significant difference: in the filled pause per second values between the Easy and Hard conditions, where the filled pause rate in the Hard condition was higher than that in the Easy condition. There was also a trend towards a difference in filled pause rate between the Easy and Medium conditions. Overall, filled pause rate, filled pauses as percentage of speaking time, silent 175  pause rate, and silent pause percentage were all higher – albeit not significantly – in the Medium and Hard conditions than in the Easy condition. This suggests that task difficulty could be having an effect on talkers’ pausing behaviours – in particular, that more difficult tasks lead to more pausing – but further research is needed to determine if this is indeed the case. It is possible that an increase in sample size would strengthen these observations, which follow the pattern found in previous studies that pause rates and percentages increase with task difficulty. However, it is also possible that different talkers display different kinds of pausing behaviour when faced with a more difficult task: some talkers may conceivably pause less in increased cognitive workload conditions. Examination of personality measures could help to untangle these factors. For example, talkers with a higher Conscientiousness score on the Big Five test typically self-report “thinking before acting” (John et al., 2008, p. 120), which could predict a tendency to pause more before giving instructions or asking questions. While Laserna et al. (2014) did not report any correlations between filled pause use and Big Five personality traits in their meta-corpus studies, they did not examine silent pause use and were not looking at different levels of cognitive workload. However, these questions will be left for future research.  While there were few consistent and reliable differences in pausing behaviour due to task difficulty, there were such differences in silent pause rate, silent pause percentage, filled pause rate, and filled pause percentage based on what role the talkers were playing in the task in a given step. Mean values for all four measures were higher when talkers were giving instructions than when they were receiving instructions. This fits the observations made by Bortfeld et al. (2001), whose task-based results showed that directors in their matching task (equivalent to the Giving role in the LEGO construction task) consistently used more filled pauses than did matchers (Receivers). It is true that both roles in the construction task in this study do entail 176  some level of cognitive workload: i.e., talkers have to both talk – either to give instructions or to indicate that they have received and understood the instructions – and build their LEGO construction at more or less the same time, regardless of which role they are in. However, the Givers also have to determine how best to describe what they are building and how to build it, as well as how to respond to any questions or misunderstandings, while they are doing both of those other tasks. Thus, it is not overly surprising that Givers have higher rates and percentages of pauses than do Receivers.  In terms of whether participants became more similar to their partners in pausing behaviour and whether any such increased similarity was affected by task difficulty, no significant results were found when absolute differences in pause rate or pause percentage values were examined in terms of task difficulty, and no clear patterns were found across the conditions. Thus, there is insufficient evidence to suggest that talkers were engaging in any kind of convergence or alignment behaviour with regard to their pausing patterns. In the analysis of changes in absolute pause rate and pause percentage differences, the talkers in the Easy and Hard conditions showed an overall tendency towards becoming less similar, with 7 of 12 measures showing an increase in the absolute difference over time. In the Medium condition, on the other hand, 7 of 12 measures showed a decrease in absolute difference over time, suggesting talkers were becoming more similar in their pausing patterns; four measures remained relatively stable over time, suggesting talkers were maintaining their patterns. Again, an increase in sample size could strengthen the consistency and reliability of these patterns, to give a clearer indication of whether or not dyads’ similarity over time in pause rates and percentages is affected by task difficulty. Nevertheless, it may well be that the diversity of behaviour seen in these measures is reflective of talkers’ actual tendencies in conversational interaction, whether under workload or 177  not: that is, some talkers may tend to converge, some to diverge, some to do each of these things at different times, and some to generally change very little. In that case, an increased sample size would better reveal that behavioural diversity.  Looking at the patterns by dyad, none of the 15 pairs showed a completely consistent pattern of increasing or decreasing similarity across all of the pausing measures. Dyad H5 in the Hard condition was the most consistent, showing an increase in absolute difference over time in all of the silent pause percentage, filled pause rate, and filled pause percentage measures, which could suggest a tendency on the part of the talkers to diverge. Dyad M2 showed a decrease in all silent pause measures, but an increase in all filled pause measures, which might indicate that they were converging on silences but not on filled pauses. Dyad E4, on the other hand, showed a decrease in all filled pause measures, suggesting that they were becoming more similar – and possibly tending to converge – in this area. Again, as suggested earlier, examination of such features as talkers’ Conscientiousness scores in the Big Five personality measure (John et al., 2008) could help to further explain their pausing behaviour.  It should be noted that while talkers were required to produce words during the task, it was not the case that they were required to produce either silent or filled pauses. The results of the pausing analysis, then, unlike those of the speech rate analysis presented in Chapter 5, are shaped to some degree by the fact that not all talkers used silent or filled pauses in all of their steps. As indicated in section 6.3.1, while there were only two talkers who had steps with no silent pauses (both in Receiving steps), there were 24 talkers who had at least one step with no filled pauses. In the Easy condition, the talkers who had steps with no filled pauses – nine of 10 talkers – all had at least two of their nine Receiving steps with no filled pauses, with one talker having eight of nine Receiving steps (plus two Giving steps) without filled pauses. In the 178  Medium condition, the number of Receiving steps with no filled pauses among the eight talkers who displayed this behaviour ranged from one (two talkers) to four (three talkers); in addition, one talker had one Giving step which did not have any filled pauses. In the Hard condition, five of the seven talkers who had at least one Receiving step with no filled pauses had only one step (out of five) in which they behaved this way, while the other two had two steps without filled pauses; no talkers in this condition had a Giving step with no filled pauses. These results are interesting for two reasons. The absence of filled pauses in a large number of the Receiving steps – fully 75 of 200 steps, or 37.5% – undoubtedly had an effect on the statistical analyses, meaning that a larger sample size would be highly advisable in further studies. Also, the number of Receiving steps without filled pauses was highest in the Easy condition (47 of 90 steps, or 52%), lower in the Medium condition (19 of 60 steps, or 31.7%), and lowest in the Hard condition (9 of 50 steps, or 18%), which once again echoes the pattern found in this study and others that higher cognitive workload tends to lead to more filled pausing. If the talkers in Khawaja (2010) and Khawaja et al. (2008) behaved in the same way as the talkers in this study, it is perhaps not surprising that those studies did not find significant differences between the low- and high-workload conditions for their filled pause percentage measures, in contrast to the significant differences shown by the overall pausing (filled and silent) percentage and silent pause percentage measures: there may simply not have been enough filled pauses to create significant differences. Further studies in this area will thus require much larger sample sizes to compensate for the optionality of using filled pauses.  An additional area of interest emerging from this study would be to explore whether talkers’ use of ‘uh’ versus ‘um’ changes due either to convergence or task difficulty. The talkers in this study were all in the social group that would be expected to have higher rates of ‘um’ than 179  ‘uh’: i.e., young and female (Liberman, 2005, 2014; Tottie, 2011; Laserna et al., 2014). However, if they interact regularly with talkers with high ‘uh’ rates (e.g., older males) they may have higher rates of ‘uh’ use. If a high-‘um’ talker is paired with a high-‘uh’ talker, will their ‘uh’/‘um’ proportions change? As well, if there is a difference in the delay-signaling value of ‘uh’ and ‘um’ (Clark & Fox Tree, 2002), it may be the case that one or the other form may be used more frequently in higher-workload conditions. Again, this research will be left for further study. 180  Chapter 7: General discussion  7.1 Summary of the study  The study described in this dissertation examined the interaction of two factors which have been shown to affect speech behaviour: speech convergence, the tendency of talkers to become more similar in their speech to someone they are talking or listening to, and task difficulty, the effect of increasing cognitive workload on talkers’ speech production and perception. The interaction of task difficulty and speech convergence was explored using a dyadic interaction task. Fifteen dyads (30 participants total) participated, with five dyads in each of three levels of difficulty: Easy, Medium, and Hard. Analyses were conducted on the dyads’ conversations in four areas: (1) Perceived similarity over time: would listeners judge the pairs of talkers to be more similar in their utterances later in the task than earlier? (2) Acoustic similarity over time: would global acoustic similarity algorithms find greater similarity in the dyads’ utterances later in the task than earlier? (3) Speech rate similarity over time: looking at both words per second and syllables per second rates, would the talkers become more similar in their speech rates over the course of the conversation? (4) Similarity in pausing behaviour over time: looking at both silent and filled (i.e., ‘uh’ and ‘um’) pauses, would the talkers become more similar over time in how often and for how long they paused?  In the perceptual judgment task, described in Chapter 3, it was found that listeners’ similarity ratings did show different patterns depending on the difficulty condition. In the Easy condition, listeners rated utterance pairings from the final third of a dyad’s conversation as more 181  similar than those in both the first third and the second third. In the Medium condition, the similarity ratings in the final third of the conversation trended higher than those in the first third and the second third. In the Hard condition, on the other hand, listeners’ ratings of talkers’ vocal similarity trended lower in the final third of the conversation than those in the first third and the second third. When by-dyad means were taken into account, the overall patterns from the by-listener analysis were broadly replicated, albeit without significance.  In the global acoustic similarity analyses, described in Chapter 4, it was found that neither amplitude envelope similarity values nor mel-frequency cepstral coefficient (MFCC) similarity values showed reliably or consistently different patterns in any of the conditions when examined over time. There was a weak positive correlation between the acoustic similarity values and the perceptual judgment ratings, suggesting that spectral similarity was playing some role in listeners’ assessments of talkers’ vocal similarity, but that it was not the only factor in those judgments. That is, unlike the algorithms used, the listeners were not restricted to using only spectral characteristics of talkers’ speech when assessing similarity: they could not only incorporate all of what was accessible to them in the speech signal in making their judgments, but also modify their judgments over time based on their increasing familiarity with the talkers’ voices.  In the analysis of speech rate described in Chapter 5, the results showed that the talkers in the more difficult conditions – Medium and Hard – did not have a significantly higher speech rate than those in the Easy condition, whether rate was measured using words per second or syllables per second, and whether speaking rate – which included all the time a talker used for producing speech, including pauses, laughter, coughs, etc. – or articulation rate – which included only the time a talker required to produce words – was the relevant measure. In terms of 182  increasing similarity in speech rate over time, no reliable or consistent patterns emerged, and no evidence was found that task difficulty affected whether talkers became more similar in their speech rates.  In the analysis of silent and filled pause rates and percentages, described in Chapter 6, the results showed a significant difference between filled pause rate in the Easy and Hard conditions, with a higher filled pause rate in the Hard condition than the Easy condition; there was also a trend towards a higher filled pause rate in the Medium condition than in the Easy condition. Talkers were thus using more filled pauses per second in the more difficult conditions than in the Easy condition. No significant differences were found due to difficulty in silent pause rate, silent pauses as a percentage of speaking time, or filled pauses as a percentage of speaking time. In terms of increasing similarity over time, no significant results were found for either silent or filled pauses when absolute differences in pause rates and percentages were examined in terms of task difficulty, and no clear patterns were found across the conditions.  Overall, the results suggest that increased task difficulty has some effect on talkers’ tendency to become more similar in their speech over time, as indicated by the perceptual judgment task results – that is, task difficulty has some effect on speech convergence. However, what exactly is being converged on more by the talkers in the easier conditions than by the talkers in the Hard condition has not yet been pinpointed, as consistent patterns were not found in the global acoustic analyses, the measures of speech rate, or the measures of pausing behaviour. As well, talkers were found to use filled pauses at a higher rate in the more difficult conditions than in the Easy condition; no other pausing or speech rate measures showed consistent and reliable changes due to difficulty.  183  7.2 General discussion  The finding that task difficulty reduces talkers’ tendency to become more similar to their partners over time, as assessed by listeners, is one that can be explained in terms of the current theoretical approaches to speech convergence. In terms of the more automatic approaches to convergence – the perception-behaviour link proposed by Bargh and colleagues (e.g., Bargh et al., 1996; Dijksterhuis & Bargh, 2001) and the speech production-perception coupling models of Pickering and Garrod (Pickering & Garrod, 2004 a, b, 2013; Garrod & Pickering, 2009) – this suggests that additional cognitive workload may cause a reallocation of attentional resources, as seen in the effects of workload on speech perception (Casini et al., 2009; Mattys & Wiget, 2011; Mattys et al., 2009, 2014). This may lead to a global damping of the perception-behaviour link/production-perception coupling, which could be intentional to some degree – i.e., in order to allow talkers to attend to higher-level concerns or to enhance the behaviours required for goal attainment. Alternatively, if the increased workload in the Hard condition leads to only certain elements of the speech stream being fully perceptually processed – e.g., in Mattys and colleagues’ proposals, lexical-semantic information would be better perceived than would acoustic-phonetic information – then only those elements of the stream which are perceived would be available to be imitated. If the elements which are subject to reduced perceptual availability in the higher-workload conditions – which would then be converged on to a lesser degree – are particularly salient to listeners in their judgments of talkers’ similarity, the lower similarity ratings over time in the Hard condition can be easily explained.  In terms of Giles’ Communication Accommodation Theory (CAT), the additional workload in the Hard condition may induce stress or anxiety for the talkers (see e.g., Gudykunst, 1995; Giles & Ogay, 2007), leading to a lack of social synchrony and a failure to converge. If the 184  stress or workload is seen as emanating from the other talker, a talker may choose to maintain her individual speech patterns to express social displeasure. As discussed in Chapter 3, it is not obvious from the audio recordings that any of the participants became frustrated with their partners; examination of the video recordings may prove useful in exploring this possibility further, as participants may have expressed frustration through gestures or facial expressions rather than through words. If talkers were not experiencing more stress, anxiety, or frustration in the Hard condition than in the easier conditions, it is more difficult to explain the lack of increased similarity over time from a CAT standpoint, as converging to an interlocutor is thought to improve the effectiveness of communication (see e.g., Thakerar et al., 1982; Giles & Coupland, 1991; Giles et al., 1991; Giles & Ogay, 2007).  The results of this dissertation may be taken as a preliminary investigation of the relationship between the effects of task difficulty and speech convergence on talkers’ speech behaviour over time. This was a relatively small study, with only five dyads in each condition. Future research using a larger number of dyads would help to determine whether the variability seen in the analyses was the result of a lack of power or of a genuine tendency for talkers to display different convergence and workload-related speech behaviours. It may well be the case that the variation seen in whether dyads became more similar over time is a result of the talkers in the dyads responding differently to the increased workload – that is, as suggested by Hecker et al. (1968) and reinforced by later studies such as Lively et al. (1993), not all talkers show the same kinds of vocal changes when engaged in a difficult task. One way to determine this would be to collect samples of talkers’ productions individually under conditions of workload – e.g., through a dual-task paradigm such as those used by Mattys and Wiget (2011), Lively et al. (1993) or Khawaja (2010) – to see what kinds of changes they will display, as well as to test their 185  perception under conditions of workload to determine whether they display the types of changes in cue use proposed by Mattys and colleagues (Mattys et al., 2005, 2009, 2014; Mattys & Wiget, 2011). While it is true that talkers may display different reactions to workload in a conversational situation than they do when they are speaking and listening individually, having an individual baseline with which to compare their dyadic behaviour would be useful.  It would also be useful to explore the personality and cognitive measures which were collected prior to talkers’ participation in the construction task – their Big Five personality scores (John et al., 1991, 2008; Benet-Martinez & John, 1998), Autism Spectrum Quotients (Baron-Cohen et al., 2001), RSPAN working memory scores (Turner & Engle, 1989; Engle, 2002; Unsworth et al., 2005) and mental rotation scores (Shepard & Metzler, 1971; Moreau, 2013) – to determine if there is an effect of these factors on their tendency to converge, on their behaviour in more difficult conditions, or on both. One interesting possibility is that there are talkers who become more similar to their partners regardless of the circumstances in which they are talking, and others who do not become more similar no matter how conducive the conditions are. The patterns seen in the individual dyads across all the measures lend some anecdotal support to this possibility; these patterns are illustrated in Table 7.1. In this representation, shades of orange indicate measures on which dyads became more similar over time (darker indicates a 5% or greater change from the initial measurement, while lighter indicates a change of 1-4.99% over time), shades of green indicate measures on which dyads became less similar over time, and grey cells indicate measures on which dyads stayed stable over time (less than a 1% change over time).186  Table 7.1 Heat map of dyads’ similarity over time on the 27 measures examined in this study. ●: dyads became more similar (≥ 5% change from starting value) over time. ●: dyads became slightly more similar (1-4.99% change) over time. ●: dyads remained stable over time (0-0.99% change) on the measure. ●: dyads became slightly less similar (1-4.99% change) over time. ●: dyads became less similar (≥ 5% change) over time. Perc: perceived similarity. Acous: global acoustic similarity. AE: amplitude envelope analysis. MFCC: mel-frequency cepstral coefficient analysis. Syll: syllable. Sp: speaking rate. Art: Articulation rate.     Speech Rate Pausing  Perc Acous  All Giving  Receiving All  Giving Receiving   AE MFCC  Word Syll. Word Syll. Word Syll. Silent Filled Silent Filled Silent Filled     Sp Art Sp Art Sp Art Sp Art Sp Art Sp Art Rate % Rate % Rate % Rate % Rate % Rate % E1                                                       E2                                                       E3                                                       E4                                                       E5                                                       M1                                                       M2                                                       M3                                                       M4                                                       M5                                                       H1                                                       H2                                                       H3                                                       H4                                                       H5                                                         187   In the 27 measures that were examined – one measure of perceived similarity, two of acoustic similarity, and 12 measures each of speech rate and pausing behaviour similarity – most of the dyads became more or slightly more similar over time on between one-third and two-thirds of the measures which were examined. However, there were three dyads which showed increasing similarity on fewer than a third of the measures: one in the Easy condition (E1), whose similarity increased over time on only four of the 27 measures, and two in the Hard condition (H3 and H4), whose similarity increased on seven and three measures respectively. There were also four dyads which showed increasing similarity on more than two-thirds of the measures: three in the Medium condition (M1, M2, and M5), whose similarity increased on 21, 21 and 19 measures, respectively, and one in the Hard condition (H1), who showed increased similarity on 22 measures. While the Medium condition showed a majority of ‘convergers’, the Hard condition had both ‘convergers’ – including the dyad which converged on the most measures – and ‘non-convergers’. As well, there were very few instances in which dyads remained stable over time on a particular measure; of the 405 individual measurements represented in Table 7.1 (15 dyads x 27 measures), only 11, or 2.7%, showed stability. It may be the case that the natural tendency in conversation is for talkers to change their behaviour relative to that of their partner, whether that is in the direction of greater similarity or of greater difference. Further exploration is needed to determine why individual talkers behave the way they do – that is, whether they are convergers, non-convergers, or simply changers (see also the discussion in Sonderegger, 2012, on the plasticity of talkers’ sound systems).  In addition to gathering information from a larger sample size and exploring talkers’ personal traits and cognitive abilities in relation to convergence and difficulty, the question of whether talkers become more or less similar in their lexical behaviour under conditions of 188  workload should also be examined. There are several factors which suggest this could be a fruitful area of investigation. Speech-in-interaction researchers have found that talkers come to use the same vocabulary and referring terms when collaborating on a task (e.g., Clark & Wilkes-Gibbs, 1986; Garrod & Anderson, 1987; Garrod & Doherty, 1994; Brennan & Clark, 1996). Khawaja and colleagues (Khawaja, 2010; Khawaja et al., 2012) found that talkers working in a team changed how they spoke in high-workload conditions, using fewer positive emotion words and more negative emotion words; more words describing mental states (e.g., ‘believe’, ‘think’) and perceptions (e.g., ‘see’, ‘hear’); more words expressing disagreement and fewer expressing agreement; and more plural pronouns than singular ones. As well, Mattys and colleagues (Mattys et al., 2009, 2014; Mattys & Wiget, 2011) have found that when listeners perceive speech under conditions of cognitive workload, they tend to rely more on lexical-semantic cues than on acoustic-phonetic cues to segment speech. This previous work suggests that talkers may become more similar in their lexical choices than they do in other areas of their speech under difficulty; further research could help to determine if this is indeed the case.  A factor which played an important role in shaping how the study developed and which was seen to have an influence on how talkers’ speech behaviour changed was whether they were giving or receiving instructions at any point in the task. Overall, when talkers were acting as Givers, they spoke more than those acting as Receivers, spoke more slowly than Receivers, and used more filled and silent pauses than Receivers. It has been claimed in the speech-in-dyadic-interaction literature (e.g., Bortfeld et al., 2001; Pardo et al., 2013a) that giving instructions is more difficult than receiving instructions. However, the behaviours described above do not fall completely in line with what would be expected of speech in difficult conditions if Giving is indeed more difficult than Receiving. For example, Khawaja (2010) and Khawaja et al. (2008) 189  found that talkers use more pauses in high-load conditions, as was found for the Givers when compared to the Receivers in this study. On the other hand, the finding that talkers speak more quickly in high-load conditions (Griffin & Williams, 1987; Lively et al., 1993; Brenner et al., 1994; Scherer et al., 2002) was not replicated here for the Givers; instead, it was the Receivers who showed the higher speech rates, even when articulation rates (i.e., words/syllables divided by the time required to articulate them) were used rather than speaking rates (i.e., words/syllables divided by the total amount of time in a step). It may be the case that difficult conditions in conversational contexts lead to different behaviour than difficult conditions in contexts of monologue do; as suggested earlier, having talkers do an individual speaking task under workload prior to completing the dyadic task could untangle this possibility. Methodologically, the greater amount of speech from the Givers meant that it was not possible to collect sufficient material from talkers’ Receiving steps for use in the perceptual judgment task and acoustic similarity analyses, so only speech from Giving steps was analyzed. As well, the overall lack of filled and silent pauses in the Receivers’ steps suggest that a much greater sample size would be needed in future studies to determine if there are systematic changes. While the differences between Givers and Receivers can be problematic in this way, they also raise interesting questions: Why did Receivers speak more quickly than Givers? Does receiving instructions in difficult conditions create reliably different behaviour than giving instructions? These questions will be left for further study.  Whatever future studies are conducted, they should be undertaken with the acknowledgement that while both convergence and cognitive workload have generally clear and reasonably reliable effects on speech behaviour in controlled laboratory settings, they may not have the same clear effects when pursued in the ‘real world’. As this dissertation has 190  demonstrated, when talkers are largely left to their own devices in terms of, for example, what vocabulary to use, what attitude to have towards their interlocutor, how to approach their task, how long they take to complete a task, inter alia, they do not necessarily behave in predictable ways when it comes to becoming more similar or reacting to presumed cognitive workload. The range of behaviours displayed in this study may in fact be the norm rather than an aberration from some overall human tendency. It may also be the case that a broadly-based indication of convergence, such as that provided by listeners’ judgments, is the best that can be achieved in a generally unrestricted conversational setting; that is, the types of domain-specific convergence seen in the laboratory may require those limited-input, contextually- and environmentally-controlled conditions to emerge fully and become significant. The same may be true of task difficulty: dual-task conditions in the laboratory may elicit rather different kinds of vocal responses and cognitive strategies than do complex conditions of workload ‘in the wild’. There is no doubt that talkers can and do become more similar to each other over time, and that cognitive workload affects how they produce and perceive speech; nevertheless, as in many other areas of speech, how, when, and why this happens may be more interestingly complex than we had expected.   191  References Abel, J., Babel, M., & Black, A. (2011). Phonetic imitation in contexts of stimulus-directed and non-stimulus-directed attention. Poster presented at the 162nd Meeting of the Acoustical Society of America, San Diego, CA, November 2011.  Abrego-Collier, C., Grove, J., Sonderegger, M., & Yu, A. C. L. (2011). Effects of speaker evaluation on phonetic convergence. In Proceedings of the 17th International Congress of Phonetic Sciences, 192-195.  Anderson, A. H., Bader, M., Bard, E. G., Boyle, E, Doherty, G., Garrod, S., Isard, S., Kowtko, J., McAllister, J., Miller, J., Sotillo, C., Thompson, H. S., & Weinert, R. (1991). The HCRC map task corpus. Language and Speech, 34(4), 351-366.  Anisfield, M. (1996). Only tongue protrusion modeling is matched by neonates. Developmental Review, 16, 149-161.  Babel, M. (2010). Dialect divergence and convergence in New Zealand English. Language in Society, 39, 437-456.  Babel, M. (2012). Evidence for phonetic and social selectivity in spontaneous phonetic imitation. Journal of Phonetics, 40, 177-189.  Babel, M., & Bulatov, D. (2012). The role of fundamental frequency in phonetic accommodation. Language and Speech, 55(2), 231-248.  Babel, M., McAuliffe, M., & Haber, G. (2013). Can mergers-in-progress be unmerged in speech accommodation? Frontiers in Psychology, 4, Article 653. DOI: 10.3389/fpsyg.2013.00653.  Babel, M., McAuliffe, M., & McGuire, G. (2014). Global similarity and listener judgments of phonetic accommodation. Paper presented at the Workshop on Interpersonal Coordination and Phonetic Convergence, University of Cologne, May 2014.  Bargh, J. A., Chen, M., & Burrows, L. (1996). Automaticity of social behavior: direct effects of trait construct and stereotype activation on action. Journal of Personality and Social Psychology, 71(2), 230-244.  Baron-Cohen, S., Wheelwright, S., Skinner, R., Martin, J., & Clubley, E. (2001). The Autism-Spectrum Quotient (AQ): Evidence from Asperger syndrome/high-functioning autism, males and females, scientists and mathematicians. Journal of Autism and Developmental Disorders, 31(1), 5-17.  Bell, A. (1984). Language style as audience design. Language in Society, 13(2), 145-204.  192  Benet-Martínez, V., & John, O. P. (1998).  Los Cinco Grandes across cultures and ethnic groups: Multitrait multimethod analyses of the Big Five in Spanish and English. Journal of Personality and Social Psychology, 75, 729-750.  Bilous, F. R., & Krauss, R. M. (1988). Dominance and accommodation in the conversational behaviours of same- and mixed-gender dyads. Language and Communication, 8(3/4), 183-194.  Black, A. (2012). Acoustic and social parameters on phonetic imitation: gender, emotion, and feature saliency. University of British Columbia Working Papers in Linguistics, 33, 16-33.  Boberg, C. (2008). Regional phonetic differentiation in standard Canadian English. Journal of English Linguistics, 36(2), 129-154.  Boersma, P., & Weenink, D. (2012). Praat: doing phonetics by computer [computer program]. Version 5.3.32, retrieved 11 November 2012 from http://www.praat.org/.  Bortfeld, H., & Brennan, S. E. (1997). Use and acquisition of idiomatic expressions in referring by native and non-native speakers. Discourse Processes, 23(2), 119-147.  Bortfeld, H., Leon, S. D., Bloom, J. E., Schober, M. F., & Brennan, S. E. (2001). Disfluency rates in conversation: Effects of age, relationship, topic, role, and gender. Language and Speech, 44(2), 123-147.  Bourhis, R. Y., & Giles, H. The language of intergroup distinctiveness. In H. Giles (Ed.), Language, Ethnicity and Intergroup Relations (pp. 119-135). London: Academic Press.  Branigan, H. P., Pickering, M. J., & Cleland, A. A. (2000). Syntactic co-ordination in dialogue. Cognition, 75, B13-B25.  Brennan, S. E., & Clark, H. H. (1996). Conceptual pacts and lexical choice in conversation. Journal of Experimental Psychology: Learning, Memory and Cognition, 22(6), 1482-1493.  Brennan, S. E., & Schober, M. F. (2001). How listeners compensate for disfluencies in spontaneous speech. Journal of Memory and Language, 44, 274-296.  Brennan, S. E., & Williams, M. (1995). The feeling of another’s knowing: prosody and filled pauses as cues to listeners about the metacognitive states of speakers. Journal of Memory and Language, 34, 383-398.  Brenner, M., Doherty, E. T., & Shipp, T. (1994). Speech measures indicating workload demand. Aviation, Space, and Environmental Medicine, 65, 21-26.  193  Butterworth, B. (1975). Hesitation and semantic planning in speech. Journal of Psycholinguistic Research, 4(1), 75-87.  Byrne, D., Dillon, H., Tranh, K., Arlinger, K., Wilbraham, K., Cox, R., Hagerman, B., Hetu, R., Kei, J., Lui, C., Kiessling, J., Nasser Kotby, M., Nasser, N. H. A., et al. (1994). An international comparison of long-term average speech spectra. Journal of the Acoustical Society of America, 96(4), 2108-2120.  Casini, L., Burle, B., & Nguyen, N. (2009). Speech perception engages a general timer: evidence from a divided attention word identification task. Cognition, 112, 318-322.  Chambers, J. M., & Carbonaro, M. (2003). Designing, developing, and implementing a course on LEGO robotics for technology teacher education. Journal of Technology and Teacher Education, 11(2), 209-241.  Chartrand, T. L., & Bargh, J. A. (1999). The chameleon effect: the perception-behavior link and social interaction. Journal of Personality and Social Psychology, 76(6), 893-910.  Chartrand, T. L., & van Baaren, R. (2009). Human mimicry. Advances in Experimental Social Psychology, 41, 219-274.  Childers, D. G., Skinner, D. P., & Kemerait, R. C. (1977). The cepstrum: a guide to processing. Proceedings of the IEEE, 65(10), 1428-1443.  Christenfeld, N. (1994). Options and ums. Journal of Language and Social Psychology, 13(2), 192-199.  Christenfeld, N. (1995). Does it hurt to say um? Journal of Nonverbal Behavior, 19(3), 171-186.  Clark, H. H., & Fox Tree, J. E. (2002). Using uh and um in spontaneous speaking. Cognition, 84, 73-111.  Clark, H. H., & Krych, M. A. (2004). Speaking while monitoring addressees for understanding. Journal of Memory and Language, 50, 62-81.  Clark, H. H., & Schaefer, E. F. (1989). Contributing to discourse. Cognitive Science, 13, 259-294.  Clark, H. H., & Wilkes-Gibbs, D. (1986). Referring as a collaborative process. Cognition, 22, 1-39.  Cliburn, D. C. (2006). Experiences with the LEGO MindstormsTM throughout the undergraduate computer science curriculum. Paper presented at the 36th ASEE/IEEE Frontiers in Education Conference, San Diego, CA, October 28-31, 2006.  194  Dijksterhuis, A., & Bargh, J. A. (2001). The perception-behavior expressway: automatic effects of social perception on social behavior. Advances in Experimental Social Psychology, 33, 1-40.  Engle, R. W. (2002). Working memory capacity as executive attention. Current Directions in Psychological Science, 11(1), 19-23.  Erard, M. (2004). Just like, er, words, not um, throwaways. The New York Times, January 3, 2004. Retrieved March 7, 2015 from http://www.nytimes.com/2004/01/03/arts/think-tank-just-like-er-words-not-um-throwaways.html.  Fawcett, R.P., & Perkins, M. R. (1981). Project report: language development in 6- to 12-year-old children. First Language, 2, 75-79.  Fernandes, T., Kolinsky, R., & Ventura, P. (2010). The impact of attention load on the use of statistical information and coarticulation as speech segmentation cues. Attention, Perception, & Psychophysics, 72(6), 1522-1532.  Freeman, L. A. (2003). Simulation and role playing with LEGO® blocks. Journal of Information Systems Education, 14(2), 137-144.  Garrod, S., & Anderson, A. (1987). Saying what you mean in dialogue: a study in conceptual and semantic co-ordination. Cognition, 27, 181-218.  Garrod, S., & Doherty, G. (1994). Conversation, coordination, and convention: an empirical investigation of how groups establish linguistic conventions. Cognition, 53, 181-215.  Garrod, S., & Pickering, M. J. (2004). Why is conversation so easy? Trends in Cognitive Sciences, 8(1), 8-11.  Garrod, S., & Pickering, M. J. (2009). Joint interaction, interactive alignment, and dialog. Topics in Cognitive Science, 1, 292-304.  Giles, H. (1973). Accent mobility: a model and some data. Anthropological Linguistics, 15(2), 87- 105.  Giles, H., Taylor, D. M., & Bourhis, R. (1973). Toward a theory of interpersonal accommodation through language: some Canadian data. Language in Society, 2(2), 177-192.  Giles, H., & Coupland, N. (1991). Language: contexts and consequences. Milton Keynes: Open University Press.  Giles, H., Coupland, N., & Coupland, J. (1991). Accommodation theory:  195  Communication, context, and consequence. In H. Giles, J. Coupland, and N. Coupland (Eds.), Contexts of Accommodation: Developments in applied sociolinguistics (pp. 1-68). Cambridge: Cambridge University Press.  Giles, H., & Ogay, T. (2007). Communication Accommodation Theory. In B. B. Whaley and W. Samter (Eds.), Explaining Communication: Contemporary Theories and Exemplars (pp. 293-310). Mahwah, NJ: Lawrence Erlbaum Associates.  Goldinger, S. D. (1998). Echoes of echoes? An episodic theory of lexical access. Psychological Review, 105(2), 251-279.  Goldman-Eisler, F. (1972). Pauses, clauses, sentences. Language and Speech, 15(2), 103-113.  Gordon, P. C., Eberhardt, J. L., & Rueckl, J. G. (1993). Attentional modulation of the phonetic significance of acoustic cues. Cognitive Psychology, 25, 1-42.  Gregory, S. W. (1990). Analysis of fundamental frequency reveals covariation in interview partners’ speech. Journal of Nonverbal Behavior, 14(4), 237-251.  Gregory, S. W., Webster, S., & Huang, G. (1993). Voice pitch and amplitude convergence as a metric of quality in dyadic interviews. Language and Communication, 13(3), 195-217.  Gregory, S. W., & Webster, S. (1996). A nonverbal signal in voices of interview partners effectively predicts communication accommodation and social status perceptions. Journal of Personality and Social Psychology, 70(6), 1231-1240.  Griffin, G. R., & Williams, C. E. (1987). The effects of different levels of task complexity on three vocal measures. Aviation, Space, and Environmental Medicine, 58, 1165-1170.  Grosjean, F., & Lane, H. (1976). How the listener integrates the components of speaking rate. Journal of Experimental Psychology: Human Perception and Performance, 2(4), 538-543.  Gudykunst, W. B. (1995). Anxiety/Uncertainty Management (AUM) theory: current status. In R. L. Wiseman (Ed.), Intercultural Communication Theory (pp. 8-58). Thousand Oaks, CA: SAGE Publications.  Hall, K. C., Allen, B., Fry, M., Mackie, S., & McAuliffe, M. (2014). Phonological CorpusTools [computer program]. Version 0.15, retrieved 29 September 2014 from https://sourceforge.net/projects/phonologicalcorpustools.  Healey, P. G. T., Purver, M., & Howes, C. (2014). Divergence in dialogue. PLoS One, 9(6), e98598. DOI: 10.1371/journal.pone.0098598   196  Hecker, M. H. L., Stevens, K. N., von Bismarck, G., & Williams, C. E. (1968). Manifestations of task-induced stress in the acoustic speech signal. Journal of the Acoustical Society of America, 44(4), 993-1001.  Henderson, A., Goldman-Eisler, F., & Skarbek, A. (1966). Sequential temporal patterns in spontaneous speech. Language and Speech, 9(4), 207-216.  Hickok, G., Houde, J., & Rong, F. (2011). Sensorimotor integration in speech processing: computational basis and neural organization. Neuron, 69(3), 407-422.  Huttunen, K., Keränen, H., Väyrynen, E., Pääkkönen, R, & Leino, T. (2011). Effects of cognitive load on speech prosody in aviation: Evidence from military simulator flights. Applied Ergonomics, 42, 348-357.  John, O. P., Donahue, E. M., & Kentle, R. L. (1991). The Big Five Inventory--Versions 4a and 54. Berkeley, CA: University of California, Berkeley, Institute of Personality and Social Research.  John, O. P., Naumann, L. P., & Soto, C. J. (2008). Paradigm shift to the Integrative Big-Five Trait Taxonomy: History, measurement, and conceptual issues. In O. P. John, R. W. Robins, and L. A. Pervin (Eds.), Handbook of personality: Theory and research (pp. 114-158). New York: Guilford Press.  Jones, S. S. (2009). The development of imitation in infancy. Philosophical Transactions of the Royal Society B, 364, 2325-2335.  Jungers, M. K., & Hupp, J. M. (2009). Speech priming: Evidence for rate persistence in unscripted speech. Language and Cognitive Processes,24(4), 611-624.  Kasl, S. V., & Mahl, G. F. (1965). The relationship of disturbances and hesitations in spontaneous speech to anxiety. Journal of Personality and Social Psychology, 1(5), 425-433.  Khawaja, M. A. (2010). Cognitive load measurement featuring speech and linguistic features. Unpublished doctoral dissertation, University of New South Wales.  Khawaja, M. A., Ruiz, N., & Chen, F. (2008). Think before you talk: An empirical study of relationship between speech pauses and cognitive load. In Proceedings of OZCHI 2008 (Cairns, Queensland, Australia), 335-338.  Khawaja, M. A., Chen, F., & Marcus, N. (2012). Analysis of collaborative communication for linguistic cues of cognitive load. Human Factors, 54(4), 518-529.  Kim, M. (2012). Phonetic accommodation after auditory exposure to native and nonnative speech. Unpublished doctoral dissertation, Northwestern University. 197   Kim, M., Horton, W. S., & Bradlow, A. R. (2011). Phonetic convergence in spontaneous conversations as a function of interlocutor language difference. Laboratory Phonology, 2, 125-156.  Kitzing, P. (1986). LTAS criteria pertinent to the measurement of voice quality. Journal of Phonetics, 14, 477-482.  Klassner, F., & Anderson, S. D. (2003). LEGO Mindstorms: not just for K-12 anymore. IEEE Robotics and Automation Magazine (June 2003), 12-18.  Klingholtz, F. (1990). Acoustic representation of speaking-voice quality. Journal of Voice, 4(3), 213-219.  Krantz, J. H. (2013). Cognition laboratory experiments: Instructions for the mental rotation experiment. Retrieved February 15, 2013 from http://psych.hanover.edu/JavaTest/CLE/Cognition/Cognition/mentalrotation_instructions.html.  Krych-Applebaum, M., Banzon Law, J., Jones, D., Barnacz, A., Johnson, A., & Keenan, J. P. (2007). “I think I know what you mean”: The role of theory of mind in collaborative communication. Interactional Studies, 8(2), 267-280.  Kuhl, P. K., & Meltzoff, A. N. (1982). The bimodal perception of speech in infancy. Science, 218 (4577), 1138-1141.  Kuhl, P. K., & Meltzoff, A. N. (1996). Infant vocalizations in response to speech: vocal imitation and developmental change. Journal of the Acoustical Society of America, 100(4), 2425-2438.  Labov, W., Ash, S., & Boberg, C. (2006). Atlas of North American English: Phonetics, phonology and sound change. Berlin: Mouton de Gruyter.  Laserna, C. M., Seih, Y-T., & Pennebaker, J. W. (2014). Um...who like says you know: filler word use as a function of age, gender and personality. Journal of Language and Social Psychology, 33(3), 328-338.  Lauwaert, M. (2008). Playing outside the box – on LEGO toys and the changing world of   construction play.  History and Technology, 24(3), 221-237.  LeGoff, D. B. (2004). Use of LEGO© as a therapeutic medium for improving social competence. Journal of Autism and Developmental Disorders, 34(5), 557-571.  LeGoff, D. B., & Sherman, M. (2006). Long-term outcome of social skills intervention based  on interactive LEGO© play. Autism, 10(4), 317-329. 198   Levelt, W. J. M., & Kelter, S. (1982). Surface form and memory in question answering. Cognitive Psychology, 14, 78-106.  Levitan, R., & Hirschberg, J. (2011). Measuring acoustic-prosodic entrainment with respect to multiple levels and dimensions. In INTERSPEECH-2011, 3081–3084.  Lewandowski, N. (2012). Talent in nonnative phonetic convergence. Unpublished doctoral dissertation, Universität Stuttgart.  Liberman, M. (2005). Young men talk like old women. Language Log blog post, November 6. 2005. Retrieved March 7, 2015 from http://itre.cis.upenn.edu/~myl/languagelog/archives/002629.html.  Liberman, M. (2014). UM/UH update. Language Log blog post, December 13, 2014. Retrieved March 7, 2015 from http://languagelog.ldc.upenn.edu/nll/?p=16414.  Liddicoat, A. J. (2007). An introduction to conversation analysis. London: Continuum.  Lindblom, B. (1990). Explaining phonetic variation: A sketch of the H & H theory. In W. J. Hardcastle & A. Marchal (eds.), Speech production and speech modelling (pp. 403-349). Dordrecht: Kluwer Academic.  Lively, S. E., Pisoni, D. B., Van Summers, W., & Bernacki, R. H. (1993). Effects of cognitive workload on speech production: acoustic analyses and perceptual consequences. Journal of the Acoustical Society of America, 95, 2962-2973.  Lobben, A. K. (2004). Tasks, strategies, and cognitive processes associated with navigational map reading: a review perspective. The Professional Geographer, 56(2), 270-281.  Mattys, S. L., White, L, & Melhorn, J. F. (2005). Integration of speech segmentation cues: a hierarchical framework. Journal of Experimental Psychology: General, 134(4), 477-500.  Mattys, S. L., Brooks, J., & Cooke, M. (2009). Recognizing speech under a processing load: dissociating energetic from informational factors. Cognitive Psychology, 59, 203-243.  Mattys, S. L., & Wiget, L. (2011). Effects of cognitive load on speech recognition. Journal of Memory and Language, 65, 145-160.  Mattys, S. L., Davis, M. H., Bradlow, A. R., & Scott, S. K. (2012). Speech recognition in adverse conditions: a review. Language and Cognitive Processes, 27(7-8), 953-978.  Mattys, S. L., Barden, K., & Samuel, A. G. (2014). Extrinsic cognitive load impairs low-level speech perception. Psychonomic Bulletin & Review, 21, 748-754.  199  Meltzoff, A. N., & Moore, M. K. (1977). Imitation of facial and manual gestures by human neonates. Science, 198(4312), 75-78.  Meltzoff, A. N., & Moore, M. K. (1994). Imitation, memory, and the representation of persons. Infant Behavior and Development, 17, 83-99.  Mendoza, E., & Carballo, G. (1998). Acoustic analysis of induced vocal stress by means of cognitive workload tasks. Journal of Voice, 12(3), 263-273.  Miller, J. L., & Grosjean, F. (1981). How the components of speaking rate influence perception of phonetic segments. Journal of Experimental Psychology: Human Perception and Performance, 7(1), 208-215.  Montello, D. R., Lovelace, K. L., Golledge, R. G., & Self, C. M. (1999). Sex-related differences and similarities in geographic and environmental spatial abilities. Annals of the Association of American Geographers, 89(3), 515-534.  Moreau, D. (2013). Differentiating two- from three-dimensional mental rotation training effects. The Quarterly Journal of Experimental Psychology, 66, 1399-1413.  Namy, L. L., Nygaard, L. C., & Sauerteig, D. (2002). Gender differences in vocal accommodation: the role of perception. Journal of Language and Social Psychology, 21(4), 422-432.  Natale, M. (1975a). Convergence of mean vocal intensity in dyadic communication as a function of social desirability. Journal of Personality and Social Psychology, 32(5), 790-804.  Natale, M. (1975b). Social desirability as related to convergence of temporal speech patterns. Perceptual and Motor Skills, 40, 827-830.  Nielsen, K. (2011). Specificity and abstraction of VOT imitation. Journal of Phonetics, 39, 132-142.  O’Connell, D. C., & Kowal, S. (2005). Uh and um revisited: are they interjections for signaling delay? Journal of Psycholinguistic Research, 34(6), 555-576.  Oviatt, S. (1995). Predicting spoken disfluencies during human-computer interaction. Computer Speech and Language, 9, 19-35.  Owens, G., Granader, Y., Humphrey, A., & Baron-Cohen, S. (2008). LEGO® therapy and the Social Use of Language Programme: An evaluation of two social skills interventions for children with high functioning autism and Asperger syndrome. Journal of Autism and Developmental Disorders, 38, 1944-1957.  200  Papert, S. (1980). Mindstorms: computers, children, and powerful ideas. New York: Basic Books.  Pardo, J. S. (2006). On phonetic convergence during conversational interaction. Journal of the Acoustical Society of America, 119, 2382-2393.  Pardo, J. (2013). Reconciling diverse findings in studies of phonetic convergence. Proceedings of Meetings on Acoustics, 19, 060140. DOI: 10.1121/1.4798479.  Pardo, J. S., Cajori Jay, I., & Krauss, R. M. (2010). Conversational role influences speech imitation. Attention, Perception, & Psychophysics, 72, 2254-2264.  Pardo, J. S., Gibbons, R., Suppes, A., & Krauss, R. M. (2012). Phonetic convergence in college roommates. Journal of Phonetics, 40, 190-197.  Pardo, J. S., Cajori Jay, I., Hoshino, R., Hasbun, S. M., Sowemimo-Coker, C., & Krauss, R. M. (2013a). Influence of role-switching on phonetic convergence in conversation. Discourse Processes, 50(4), 276-300.  Pardo, J. S., Jordan, K., Mallari, R., Scanlon, C., & Lewandowski, E. (2013b). Phonetic convergence in shadowed speech: The relation between acoustic and perceptual measures. Journal of Memory and Language, 69, 183-195.  Piaget, J. (1999). Play, dreams, and imitation in childhood. Translated by C. Gattegno and F. M. Hodgson. London: Routledge. (Original work published 1951)  Pickering, M. J., & Garrod, S. (2004a). Toward a mechanistic psychology of dialogue. Behavioral and Brain Sciences, 27, 169-190.  Pickering, M. J., & Garrod, S. (2004b). The interactive-alignment model: developments and refinements. Behavioral and Brain Sciences, 27, 212-219.  Pickering, M. J., & Garrod, S. (2013). An integrated theory of language production and comprehension. Behavioral and Brain Sciences, 36, 329-347.  Putman, W. B, & Street Jr., R. L. (1984). The conception and perception of noncontent speech performance: implications for speech accommodation theory. International Journal of the Sociology of Language, 46, 97-114.  Resnick, M., Ocko, S., & Papert, S. (1988). LEGO, Logo, and design. Children’s Environments Quarterly, 5(4), 14-18.  Resnick, M., Martin, F., Sargent, R., Silverman, B. (1996). Programmable bricks: toys to think with. IBM Systems Journal, 35(3&4), 443-452.  201  Richardson, M. J., Marsh, K. L., Isenhower, R. W., Goodman, J. R. L., & Schmidt, R. C. (2007). Rocking together: dynamics of intentional and unintentional interpersonal coordination. Human Movement Science, 26, 867-891.  Rosenfelder, I., Fruehwald, J., Evanini, K., & Yuan, J. (2011). FAVE (Forced Alignment and Vowel Extraction) program suite. Philadelphia, PA: University of Pennsylvania Linguistics Lab. Accessed via http://fave.ling.upenn.edu/index.html.  Sacks, H., Schegloff, E. A., & Jefferson, G. (1974). A simplest systematics for the organization of turn-taking for conversation. Language, 50(4), 696-735.  Schachter, S., Christenfeld, N., Ravina, B., & Bilous, F. (1991). Speech disfluency and the structure of knowledge. Journal of Personality and Social Psychology, 60(3), 362-367.  Schegloff, E. A. (1981). Discourse as an interactional achievement: some uses of ‘uh huh’ and other things that come between sentences. In D. Tannen (Ed.), Analyzing Discourse: Text and Talk. Georgetown University Round Table on Languages and Linguistics 1981 (pp. 71-93). Washington, D.C.: Georgetown University Press.  Scherer, K. R., Grandjean, D., Johnstone, T., Klasmeyer, G., & Bänziger, T. (2002). Acoustic correlates of task load and stress. In Proceedings of the 7th International Conference of Spoken Language Processing (ICSLP2002), Denver, CO, 2017-2020.  Schilperoord, J. (2002). On the cognitive status of pauses in discourse production. In T. Olive & C. M. Levy (Eds.), Contemporary Tools and Techniques for Studying Writing (pp. 61-87). London: Kluwer.  Schneider, W., Eschman, A., & Zuccolotto, A. (2007). E-Prime User’s Guide. Version 2.0. Pittsburgh: Psychology Software Tools Inc.  Shepard, R. N., & Metzler, J. (1971). Mental rotation of three-dimensional objects. Science, 171(3972), 701-703.  Shockley, K., Sabadini, L., & Fowler, C. A. (2004). Imitation in shadowing words. Perception and Psychophysics, 66(3), 422-429.  Shumway, R. H., & Stoffer, D. S. (2014). Time series analysis and its applications. New York: Springer. Online version accessed 27 September 2014 via http://resolve.library.ubc.ca/cgi-bin/catsearch?bid=7653778.  Sonderegger, M. (2012). Phonetic and phonological dynamics on reality television. Unpublished doctoral dissertation, University of Chicago.  202  Stevens, S. S., Volkmann, J., & Newman, E. B. (1937). A scale for the measurement of the psychological magnitude pitch. Journal of the Acoustical Society of America, 8(3), 185-190.  Street Jr., R. L. (1982). Evaluation of noncontent speech accommodation. Language and Communication, 2(1), 13-31.  Thakerar, J. N., Giles, H., & Cheshire, J. (1982). Psychological and linguistic parameters of speech accommodation theory. In C. Fraser & K. R. Scherer (Eds.), Advances in the social psychology of language (pp. 205-255). Cambridge: Cambridge University Press.  Tolkmitt, F. J., & Scherer, K. R. (1986). Effect of experimentally induced stress on vocal parameters. Journal of Experimental Psychology: Human Perception and Performance, 12(3), 302-313.  Toro, J. M., Sinnett, S., & Soto-Faraco, S. (2005). Speech segmentation by statistical learning depends on attention. Cognition, 97, B25-B34.  Tottie, G. (2011). Uh and um as sociolinguistic markers in British English. International Journal of Corpus Linguistics, 16(2), 173-197.  Triandis, H. C. (1960). Cognitive similarity and communication in a dyad. Human Relations, 13, 175-183.  Turner, M. L., & Engle, R. W. (1989). Is working memory capacity task dependent? Journal of Memory and Language, 28, 127-154.  Unsworth, N., Heitz, R. P., Schrock, J. C., & Engle, R. W. (2005). An automated version of the operation span task. Behavior Research Methods, 37, 498-505.  Van Engen, K. J., Baese-Berk, M., Baker, R. E., Choi, A., Kim, M., & Bradlow, A.R. (2010). The Wildcat corpus of native- and foreign-accented speech: communicative efficiency across conversational dyads with varying language alignment profiles. Language and Speech, 53(4), 510-540.  Wade, T., Dogil, G., Schütze, H., Walsh, M., & Möbius, B. (2010). Syllable frequency effects in a context-sensitive production model. Journal of Phonetics, 38, 227-239.  Wells, J. C. (1982). Accents of English (3 vols.). Cambridge: Cambridge University Press.  Wilson, M., & Wilson, T. P. (2005). An oscillator model of the timing of turn-taking. Psychonomic Bulletin and Review, 12(6), 957-968.  Wittenburg, P., Brugman, H., Russel, A., Klassmann, A., Sloetjes, H. (2006). ELAN: a professional framework for multimodality research. Nijmegen: Max Planck Institute for 203  Psycholinguistics, The Language Archive. Version 4.6.1, retrieved June 17, 2013 from http://tla.mpi.nl/tools/tla-tools/elan/.  Wright, P., & Hull, A. J. (1990). How people give verbal instructions. Applied Cognitive Psychology, 4, 153-174.  Yu, A. C. L. (2010). Perceptual compensation is correlated with individuals’ “autistic” traits: Implications for models of sound change. PLoS One, 5(8): e11950. DOI:10.1371/journal.pone.0011950.  Yu, A. C. L., Grove, J., Martinović, M., & Sonderegger, M. (2011). Effects of working memory capacity and “autistic” traits on phonotactic effects in speech production. In E. Zee (Ed.), Proceedings of the 17th International Congress of Phonetic Sciences, Hong Kong, 2236-2239.  Yu, A. C. L., Abrego-Collier, C., & Sonderegger, M. (2013). Phonetic imitation from an individual-difference perspective: Subjective attitude, personality and “autistic” traits. PLoS One, 8(9): e74746. DOI:10.1371/journal.pone.0074746.  Zajonc, R. B., Pietromonaco, P., & Bargh, J. (1982). Independence and interaction of affect and cognition. In M. S. Clark & S. T. Fiske (Eds.), Affect and Cognition: the Seventeenth Annual Carnegie Symposium on Cognition (pp. 211-227). Hillsdale, NJ: Lawrence Erlbaum Associates.204  Appendices Appendix A: LEGO construction designs  A.1 Easy design                         205                                 206                                 207                                  208      209   A.2 Medium design               210                 211                  212  A.3 Hard design               213                  214                215  A.4 Inventory of LEGO pieces used Table A.1 Inventory of pieces in designs LEGO Digital Design Name LEGO Digital Design Colour Predicted Participant  Colour Name Quantity Design 1 Design 2 Design 3 Brick Ø16 W. Cross Black Black 1 yes yes yes Plate 1x6 Black Black 1 yes yes yes Plate 4x6 Black Black 1 yes yes yes Slanting Standard 2x4/2x2 Black Black 1 yes yes yes Brick 1x8 Bright Blue Blue 1 yes yes yes Brick 2x8 Bright Blue Blue 1 yes yes yes Brick 4x12 Bright Blue Blue 1 yes yes yes Plate 2x2 Bright Blue Blue 1 yes yes yes Plate 2x4 Bright Blue Blue 1 yes yes yes Roof Tile 2x2/45° Bright Blue Blue 1 yes yes yes Brick Ø16 W. Cross Bright Blue Blue 1     yes Plate 2x2 Round Reddish Brown Brown 1     yes Plate 2x10 Bright Green Green/Dark Green 1 yes yes yes Round Brick 1x1 Bright Green Green/Dark Green 1     yes Brick 1x8 Medium Stone Grey Grey 1 yes yes yes Plate 4x6 Medium Stone Grey Grey 1 yes yes yes Rocket Step 4x4x2 Medium Stone Grey Grey 1 yes yes yes Stem Plate 7x6 W/Cor. Medium Stone Grey Grey 1 yes yes yes Undercarriage 2x2x2 Medium Stone Grey Grey 1 yes yes yes Brick 2x4 Medium Blue Light Blue/Pale Blue 2 yes yes yes Brick 1x2 Bright Yellowish Green Light Green 1 yes yes yes Brick 2x4 Bright Yellowish Green Light Green 1 yes yes yes Brick 2x4 Cool Yellow Light Yellow/Pale Yellow 1 yes yes yes 216  LEGO Digital Design Name LEGO Digital Design Colour Predicted Participant  Colour Name Quantity Design 1 Design 2 Design 3 Brick 2x4 Bright Orange Orange 1 yes yes yes Brick 2x4 Bright Reddish Violet Purple 1 yes yes yes Brick 2x4 Bright Red Red 1 yes yes yes Brick 2x8 Bright Red Red 1 yes yes yes Plate 6x10 Bright Red Red 1 yes yes yes Plate 1x6 White White 1 yes yes yes Plate 2x2 Angle White White 1 yes yes yes Plate 2x2 Round White White 1 yes yes yes Plate 2x4 White White 1 yes yes yes Stick/Aerial White White 1     yes Brick 1x4 Bright Yellow Yellow 1 yes yes yes Brick Ø16 W. Cross Bright Yellow Yellow 1 yes yes yes Glass Case Bright Yellow Yellow 1 yes yes yes Left Roof Tile 2x3 Bright Yellow Yellow 1 yes yes yes Plate 6x10 Bright Yellow Yellow 1 yes yes yes Right Roof Tile 2x3 Bright Yellow Yellow 1 yes yes yes  217  Table A.2 Inventory of extra pieces LEGO Digital Design Name LEGO Digital Design Colour Predicted Participant Colour Name Quantity Brick 2x4 Bright Blue Blue 1 Brick 2x4 Reddish Brown Brown 1 Brick 2x3 Bright Red Red 1 Brick 1x2 Bright Green Green/Dark Green 1 Brick Ø16 W. Cross White White 1 Brick Ø16 W. Cross Transparent Yellow Transparent/Translucent Yellow 1 Plate 1x6 Bright Yellow Yellow 1 Plate 2x4 Black Black 1 Roof Tile 2x2/45° Inv. Bright Blue Blue 1 Nose Cone 2x2x2 Medium Stone Grey Grey 1 218    Appendix B: Participant results on personality and cognitive measures B.1 Big Five and Autism Spectrum Quotient (AQ) scores Participant Agreeableness Conscientiousness Extraversion Neuroticism Openness AQ 101 4.89 2.89 4 3.75 4.1 20 102 4.56 4.11 3.375 2.75 4.2 13 103 3.33 3.78 1.75 4.375 3.8 31 104 4 3 3.875 2.75 4.2 12 105 4.22 3.44 3.625 3.125 3.7 16 106 4.22 3 3.375 2.125 4.2 12 107 3.22 2.78 4.125 3.875 4.3 16 108 4.11 3.56 1.625 4.125 3.8 25 109 4.22 3.33 4 3 3.4 19 110 3.78 3.67 2.5 4.125 4.1 17 111 3.44 2.56 3.25 2.75 4 12 112 4.44 3.67 4.25 2.875 3.8 15 113 3 3.89 3 4.125 3.5 19 114 3.22 3.11 2.875 4.625 3.5 9 115 3.78 3.44 1.625 3.625 3.6 26 116 3.11 2.44 4.125 4.125 4.4 22 117 3.78 3.44 3.375 2.5 4.8 16 118 4.89 3.33 2.375 4.5 4 25 119 3.78 2.67 2.625 4.375 3 17 121 4.44 4 4 2 4 12 124 3.22 3.11 2.625 4 4.2 14 125 4.33 3.56 2.75 3 3.3 20 219  Participant Agreeableness Conscientiousness Extraversion Neuroticism Openness AQ 126 3.44 3.22 3.125 3.75 2.7 21 127 3.33 2.78 2.25 3.625 3.8 27 128 3.89 3.89 1.5 3.625 3.6 23 129 4.44 2.33 4.25 2.875 3.8 12 130 4.33 3.56 3.625 2.875 3.8 10 131 2.89 4.22 2.625 4 3.2 14 132 4 2.89 3.625 3.625 3.6 13 134 3.78 3.89 4.25 1.875 4 11    220  B.2 RSPAN and mental rotation scores Participant RSPAN Absolute RSPAN Total Correct RSPAN Reading Errors RSPAN Speed Errors RSPAN Accuracy Errors Mental Rotation Accuracy Mental Rotation Reaction Time (msec) 101 21 41 10 4 6 0.9 6612 102 39 59 1 1 0 0.87 8615 103 62 71 2 0 2 1 7032 104 49 65 1 1 0 0.87 6235 105 50 65 2 0 2 0.83 5797 106 63 70 2 0 2 0.77 14669 107 38 60 2 0 2 0.77 15571 108 46 61 3 1 2 0.97 10709 109 61 69 1 0 1 0.87 4941 110 58 68 4 0 4 0.87 8140 111 50 62 1 0 1 0.67 10304 112 36 61 1 1 0 0.97 12297 113 12 35 9 6 3 0.73 12652 114 38 62 1 0 1 1 5627 115 15 35 5 0 5 0.73 8052 116 50 65 3 2 1 0.57 7361 117 40 63 5 2 3 1 12785 118 61 71 5 1 4 0.7 5567 119 54 63 4 1 3 0.83 9712 121 41 63 1 0 1 0.9 3760 124 41 55 6 4 2 0.97 4030 125 35 62 2 1 1 1 3053 126 12 28 23 2 21 0.77 6395 127 68 74 2 0 2 0.73 11142 221  Participant RSPAN Absolute RSPAN Total Correct RSPAN Reading Errors RSPAN Speed Errors RSPAN Accuracy Errors Mental Rotation Accuracy Mental Rotation Reaction Time (msec) 128 43 62 10 2 8 0.67 10336 129 48 61 0 0 0 0.87 5533 130 32 59 1 1 0 0.8 9856 131 31 52 3 0 3 0.9 8746 132 40 56 8 0 8 0.67 6129 134 62 69 3 0 3 0.5 2064  222   Appendix C: Language background information form for all participants Language Background Questionnaire                            Subject Number ______________  • Are you male or female? ______________    • What is your race (check all that apply):  ___First Nations  ___Asian   ___ Pacific Islander     ___ Black ___ White   ___Hispanic  ___South Asian    • What is your age? _____  • What cities or towns have you lived in?  List first the place where you were born, and list each town or city you have lived in. (Please use the back of the sheet if you need more room.)  birth       until  age ______ in town/city _________________________________ age _____ until  age ______ in town/city _________________________________ age _____ until  age ______ in town/city _________________________________ age _____ until  age ______ in town/city _________________________________ age _____ until  age ______ in town/city _________________________________  • What languages do you speak (include your native language(s))?  When did you start learning this language? How would you rate your proficiency in reading, writing, speaking, and understanding it?  (1) not at all, (2) poorly, (3) fairly well, (4) fluently.  language  age  reading writing  speaking understanding ______________ _____  1 2 3 4  1 2 3 4   1 2 3 4                  1 2 3 4 ______________ _____  1 2 3 4  1 2 3 4   1 2 3 4                  1 2 3 4 ______________ _____  1 2 3 4  1 2 3 4   1 2 3 4                  1 2 3 4 ______________ _____  1 2 3 4  1 2 3 4   1 2 3 4                  1 2 3 4  • Do you have any speech or hearing disorders?    If “yes”, please specify.    no  yes  • Do you have any difficulties with colour vision?  no  yes  • Are you right-handed or left-handed? _____________________  • What is the highest level of education that you have completed? _____________________________________  • What are the professions of your parents or other caregivers?   223  Appendix D: Post-task questionnaire for construction task Post-Task Questionnaire  Subject Number ____________ Your answers to these questions will only be seen by the researchers. If you are not sure what the questions are asking, please ask the experimenter.  Questions  1. What is your major?  2. Did you play with LEGO or similar construction toys when you were a child? If you played with other similar toys, what were they called?     3. Using the scale below, indicate how often you are in situations where you have to give verbal instructions to someone.  0 1 2 3 4 5 6 7 8 9 10     If you do have to give verbal instructions, in what kind of situations do you have to do so?      4. Using the scale below, indicate how often you are in situations where you have to follow verbal instructions which someone is giving to you.  0 1 2 3 4 5 6 7 8 9 10                 If you do have to follow verbal instructions, in what kind of situations do you have to do so?     5. Did you enjoy the task you worked on today? Why or why not?   Never Several times a day Never Several times a day  224  6. Using the scale below, indicate how difficult you found the task you worked on today to be.   0 1 2 3 4 5 6 7 8 9 10               7. As the task went along, did you find that it got easier, harder, or stayed at about the same level of difficulty? Why do you think that was?        8. Using the scale below, indicate how well you think you and your partner worked together.  0 1 2 3 4 5 6 7 8 9 10        Please explain why you chose this rating.            9. Using the scale below, indicate how difficult you found it to follow your partner’s instructions.  0 1 2 3 4 5 6 7 8 9 10    Please explain why you chose this rating.    The easiest  thing I’ve done  today The most difficult  thing I’ve done today We didn’t  work well together at all We worked very well together Very easy  to follow; I knew exactly what she meant.  Very difficult  to follow; I didn’t  know what she meant at all.  225  10. Would you want to work with this partner again on similar tasks? Why or why not?        11. Would you want to work with this partner on other kinds of tasks? If yes, what kinds of tasks? If no, why not?        12. Would you want to meet this partner in a social situation? Why or why not?     226  13. Please indicate with a check mark below how often you do or have done the following activities.  Frequency of Activity  Now Past (more than a year ago) Activity Regularly Sometimes Never Regularly Sometimes Never Build with LEGO or similar construction toys using instructions       Build with LEGO or similar construction toys without instructions (free building)       Cook/bake  with a written recipe       Cook/bake  without a written recipe       Sew/knit/crochet (etc.) following a pattern       Sew/knit/crochet (etc.) without a pattern        Needlepoint/ cross-stitch       Physical puzzles (e.g., jigsaw puzzles, Rubik’s cube)        Mental puzzles (e.g., crosswords, Sudoku)        Computer/video game puzzles (e.g., Tetris)                 227   Frequency of Activity  Now Past (more than a year ago) Activity Regularly Sometimes Never Regularly Sometimes Never Build models from a kit        Build models from scratch (without a kit)       Put together  Ikea-type furniture with the instructions       Put together  Ikea-type furniture without the instructions       Use a map/Google map to get directions        Use GPS  to get directions       Play a musical instrument with a group        Sing with a group or a band       Dance with a partner or a group        Play a sport with a partner or a team       Do a sport or activity which many people consider ‘dangerous’                      228   Frequency of Activity  Now Past (more than a year ago) Activity Regularly Sometimes Never Regularly Sometimes Never Wear a uniform for work, school, or for  an organization you are part of       ‘Individualize’ your work/school/organization uniform       Change or reschedule your plans in order to do  something fun       Change or reschedule your plans in order to help a friend/family member/co-worker       Go to an event you’re not really interested in because of a friend/ family member        Go to an event by yourself  where you don’t know anyone       Try something because a friend/family member/ co-worker said it would be fun        Try something no one you know has tried       Travel with a group       Travel on your own        229  Appendix E: Stimuli for perception task and acoustic similarity analyses E.1 Easy condition stimuli Dyad Conversation Third Talker 1 Phrase (A) Talker 2 Phrase (X) E1 1 half of the blue on the right side  1 I think it's the only one like one red circle  1 there should be ten on the red side  2 on the left and right side so you're only covering two by two  2 on top of the yellow so you leave that on top  2 it should be covering it all the holes are covering it  3 forget about the yellow there's another grey piece  3 circular piece right on the bottom  3 underneath the white piece okay it's a grey piece E2 1 there's a blue piece right side  1 four rows down on the left  1 at the top part it doesn't look like it  2 right in the centre it looks funny  2 that white circular piece two by fours  2 on the red piece this is interesting  3 the one that looks really weird two circular ends  3 one row of four right at the front there  3 hanging off there that chair E3 1 facing you on the bottom  1 the short end underneath that  1 on the yellow and the black take that blue piece  2 two dots on each side five rows of dots  2 on the yellow side longest edge  2 just the one hole widest edge  230  Dyad Conversation Third Talker 1 Phrase (A) Talker 2 Phrase (X)  3 the three dots a little grey dumbbell  3 just one row of four the grey funny shape  3 the white and the black one dark green plank E4 1 black flat one the blue one  1 counting from the top the very right  1 the yellow one round piece  2 over the red yellow piece  2 at the very top these two pieces  2 the fat end and that's it  3 it's like centred on the bottom  3 blue piece white piece  3 of the triangle one by two E5 1 it's in the very middle or eight circles  1 I don't know how this works closest to you  1 the diagram has one the skinny long piece  2 hanging off of it the steeper angle  2 grey piece on the tower part  2 sorta like a chair make that side face you  3 it's just one from the yellow piece  3 I think there's only one of them and it's like a rectangle  3 and it just goes on the grey piece it's the smaller cone      231  E.2 Medium condition stimuli Dyad Conversation Third Talker 1 Phrase (A) Talker 2 Phrase (X) M1 1 two by four eight by two  1 eight red ones one line of red  1 on the very edge line it up  2 the base of the wine glass three red dots  2 it's the other part two by four  2 to the left in the middle  3 to the right towards you  3 there's nothing hanging off it should be in the same direction  3 it all works again in the same direction M2 1 right at the edge so it's like centred  1 and connect the two pale yellow  1 it is going vertically on the red bottom piece  2 I think that's it yeah it's a two layer  2 the eight circles and now they line up  2 the light blue three rows of grey  3 so like where you would sit they're alongside the blue  3 facing to the right no that's more confusing  3 like a cone symmetrically M3 1 four by two four by two  1 this is interesting take the light yellow  1 on the inner column put it on top of the green  2 not the dark blue towards the left  2 on top of the platform and I think that's all  2 the lime green two by four  3 the black one behind the turbine  3 the grey panel no I have it on the right  232  Dyad Conversation Third Talker 1 Phrase (A) Talker 2 Phrase (X)  3 it's the other grey one I think that's it M4 1 the blue piece that I put on the red and the grey  1 sort of a square shape on top of the blue  1 a thin one on the red  2 on the very left side not the clear one  2 instead of horizontally the blue underneath  2 the yellow block if that makes sense  3 don't stick it on anything yet that's kinda like a bell  3 two grey pegs facing you  3 grey piece shaped like an arrow M5 1 at the very top of them on one side  1 in the middle of the two and it's placed one up  1 a bright purple one but it's yellow  2 directly in the middle not the one that looks like a helmet  2 a thing on the end of an airplane I mean there's no flat one  2 it just kinda balances there solid yellow colour  3 four piece thing those are the three pieces  3 just the same direction the rectangular piece  3 to the right to the very right     233  E.3 Hard condition stimuli Dyad Conversation Third Talker 1 Phrase (A) Talker 2 Phrase (X) H1 1 on the left on the outside  1 mine says three and then on the other side  1 the flat blue one it's like one by eight  2 like a stool I see what you mean  2 bottom pieces there should be four  2 the brown piece right four by two  3 I should clarify that might be it  3 jutting out from the middle the grey's on the other side  3 some sort of ship 'cause there's four green H2 1 the thicker blue on the left side  1 it's not flat though the same shape as the blue one  1 on those two dots the one with eight  2 from one end to the other directly on top  2 yellow circle up and down  2 the one on the right side so like a single row  3 it's a little cylinder a round top  3 onto the back of the chair below the yellow ones  3 it's big and flat there's two of them H3 1 they'll be joined together a black rectangle  1 put them side by side on the very left side  1 on your left blue rectangle  2 between them two pegs  2 at the very back on the very right side  2 the lime green piece one red  3 the narrow bit at the end  3 on top three pegs  234  Dyad Conversation Third Talker 1 Phrase (A) Talker 2 Phrase (X)  3 the wider bit on the front of the L H4 1 the green one's at the top underneath the red  1 two little towers the very left  1 the blue piece dark blue  2 the green and the white and then four up and down  2 a little round table they flip it around  2 the widest part the solid yellow  3 the round blue the grey cone to your right  3 I actually think that's it the yellow wings  3 facing to your left the lime green stack H5 1 it's like light yellow with the black  1 bottom row of four that space isn't gonna be filled  1 the same as the green one on the outside  2 like four across the green piece  2 so look at the red piece so you're putting it right in the middle  2 directly from above so now you're done with up there  3 I'm just trying to think and then the red piece  3 that's supposed to be brown but on the yellow side  3 really small green piece and the skinnier part is at the bottom    235  Appendix F: Syllabification of words not in the CMU Pronouncing Dictionary Word Number of Syllables alrighty 3 bajillion 3 blue-y 2 boat-like 2 chair's 1 circle-y 3 colour's 2 crane-y 2 cylinder's 3 diffuser 3 divots 2 drumbell 2 dumbell's 2 d'you 2 forty-eight 3 forty-five 3 fuchsia 2 gajillion 3 greys 1 hatlike 2 hm 1 holdin' 2 inwards 2 knobbies 2 lampshade 2 leftmost 2 Legos 2 lilac-y 3 middles 2 mkay 2 mm 1 m-m [mʔm] 2 movin' 2 oopsies 2 orange-y 3 orientation's 5 Word Number of Syllables overexplaining 5 overtop 3 perpendicularly 6 purple-y 3 riggy 2 rightmost 2 sh 1 skinnies 2 slants 1 slanty 2 s'okay 2 sorta 2 symmetrically 4 them's 1 thingies 2 things'll 2 tip's 1 topmost 2 trapezoid 3 trapezoids 3 triangle-ish 4 triangle-y 4 triangular-ish 5 tryna 2 twenty-four 3 uh-oh 2 widthwise 2 wooh 1 yellow's 2 y'know 2  

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0166308/manifest

Comment

Related Items