Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Visual discrimination of French and English in inter-speech and speech-ready position D'Aquisto, Joseph Paul 2014

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Notice for Google Chrome users:
If you are having trouble viewing or searching the PDF with Google Chrome, please download it here instead.

Item Metadata


24-ubc_2014_november_daquisto_joseph.pdf [ 667.4kB ]
JSON: 24-1.0166962.json
JSON-LD: 24-1.0166962-ld.json
RDF/XML (Pretty): 24-1.0166962-rdf.xml
RDF/JSON: 24-1.0166962-rdf.json
Turtle: 24-1.0166962-turtle.txt
N-Triples: 24-1.0166962-rdf-ntriples.txt
Original Record: 24-1.0166962-source.json
Full Text

Full Text

Visual	  Discrimination	  of	  French	  and	  English	  in	  Inter-­‐Speech	  and	  Speech-­‐Ready	  Position	  	  by	  	  Joseph	  Paul	  D’Aquisto	  	   A.A.S.,	  Computer	  Information	  Systems,	  A.B.	  Tech	  Community	  College,	  2002	  B.A.	  Honors,	  Linguistics	  &	  Russian,	  University	  of	  Arizona,	  2011	  	  	   	  A	  THESIS	  SUBMITTED	  IN	  PARTIAL	  FULFILLMENT	  OF	  THE	  REQUIREMENTS	  FOR	  THE	  DEGREE	  OF	  	  MASTER	  OF	  ARTS	  	  in	  	  The	  Faculty	  of	  Graduate	  and	  Postdoctoral	  Studies	  	  (Linguistics)	  	  	  	  	  	  THE	  UNIVERSITY	  OF	  BRITISH	  COLUMBIA	  (Vancouver)	  	  	  August	  2014	  	  	  ©	  Joseph	  Paul	  D’Aquisto,	  2014	  	  	  	  	  	   ii	  Abstract	  This	  study	  investigates	  the	  ability	  of	  observers	  to	  discriminate	  between	  French	  and	  English	  using	  visual-­‐only	  stimuli.	  This	  study	  differs	  from	  prior	  studies	  because	  it	  specifically	  uses	  inter-­‐speech(ISP)	  and	  speech-­‐ready	  tokens	  rather	  than	  full	  sentences.	  The	  main	  purpose	  of	  this	  research	  was	  to	  answer	  if	  observers	  could	  successfully	  discriminate	  French	  from	  English	  by	  watching	  video	  clips	  of	  speakers	  engaged	  in	  ISP	  and	  speech-­‐ready	  positions	  with	  the	  audio	  removed.	  Two	  experiments	  were	  conducted;	  the	  first	  experiment	  focuses	  on	  native	  English	  vs.	  non-­‐native	  English	  speakers	  and	  the	  second	  experiment	  focuses	  on	  native	  English	  vs.	  native	  French	  speakers	  which	  expands	  further	  on	  the	  data	  in	  the	  first	  experiment.	  The	  results	  support	  the	  view	  that	  observers	  can	  visually	  distinguish	  their	  native	  language	  even	  in	  the	  absence	  of	  segmental	  information.	  	  	   	  	  	   iii	  Preface	  All	  of	  the	  work	  presented	  in	  this	  thesis	  was	  conducted	  at	  facilities	  of	  the	  University	  of	  British	  Columbia	  Department	  of	  Linguistics.	  This	  research	  was	  conducted	  with	  approval	  by	  the	  University	  of	  British	  Columbia’s	  Research	  Ethics	  Board	  as	  part	  of	  the	  research	  project	  entitled	  “Processing	  Complex	  Speech	  Motor	  Tasks”	  under	  the	  certificate	  numbers:	  H04-­‐80337	  and	  B04-­‐0337	  supervised	  by	  principal	  investigator	  Bryan	  Gick.	  I	  was	  the	  lead	  researcher	  on	  this	  part	  of	  the	  project	  and	  responsible	  for	  all	  major	  areas	  of	  concept	  formation,	  data	  collection	  and	  analysis,	  as	  well	  as	  the	  majority	  of	  manuscript	  composition.	  Bryan	  Gick	  was	  my	  supervisor	  and	  additional	  committee	  members	  were	  Eric	  Vatikiotis-­‐Bateson	  and	  Rose-­‐Marie	  Déchaine	  who	  were	  involved	  throughout	  the	  project	  in	  discussion	  and	  manuscript	  edits.  	  	  	   iv	  Table	  of	  Contents	  Abstract	  ......................................................................................................................................	  ii	  Preface	  .......................................................................................................................................	  iii	  Table	  of	  Contents	  ....................................................................................................................	  iv	  List	  of	  Tables	  ..............................................................................................................................	  v	  List	  of	  Figures	  ..........................................................................................................................	  vi	  Acknowledgements	  ..............................................................................................................	  vii	  1	   Introduction	  ..................................................................................................................1	  1.1.	  Background	  on	  ISP	  	  .....................................................................................................................................	  3	  1.1.1	  Ultrasound	  Studies	  of	  ISP:	  Gick	  et	  al.	  2004	  	  .................................................................................	  3	  1.1.2	  Ultrasound	  and	  Optotrak	  Studies	  of	  ISP:	  Wilson	  2006	  	  ..........................................................	  4	  1.1.3	  Ultrasound	  Studies	  of	  Bilingual-­‐Mode	  ISP:	  	  .................................................................................	  4	  1.1.4	  Electropalatographic	  Studies	  of	  ISP:	  Schaeffler	  et	  al.	  2008:	  ................................................	  5	  1.1.5	  MRI	  Studies	  of	  ISP:	  ................................................................................................................................	  9	  1.2.	  Visual	  Speech	  Information	  From	  Facial	  Movements	  	  ................................................................	  13	  1.2.1	  The	  Contribution	  of	  Visual	  Speech	  Information	  .....................................................................	  13	  1.2.2	  Language	  Identification	  From	  Visual-­‐Only	  Cues	  ......................................................................	  14	  1.2.3	  The	  Bilingual	  Advantage	  in	  Language	  Identification	  of	  Visual	  Stimuli………………….16	  1.2.4	  Increased	  Cognitive	  Load	  of	  Visual-­‐Only	  Speech…………………………………………………..17	  1.3.	  Linguistic	  Information	  From	  Facial	  Movements	  &	  Facial	  Recognition…...………………..18	  1.4	  Goal	  &	  Questions	  of	  This	  Study…………………………………………………………………………….22	  2	   Native/Non-­‐Native	  Perceivers	  Pilot…………………………………………………….24	  2.1.	  Methods………………………………………………………………………………………………………………24	  2.2.	  Results………………………………………………………………………………………………………………...27	  2.3.	  Discussion…………………………………………………………………………………………………………...32	  3	   Native	  English	  vs.	  Native	  French	  Perceivers………………………………………..34	  3.1.	  Methods………………………………………………………………………………………………………………35	  3.2.	  Results	  and	  Discussion…………………………………………………………………………………………38	  4	   Conclusion…………………………………………………………………………………………..47	  References…………………………………………………………………………………………………….49	  	  	   	  	  	   v	  List	  of	  Tables	  Table	  2.1	   Tokens	  Identified	  by	  at	  Least	  5	  Out	  of	  6	  Participants	  ................................	  31	  Table	  3.1	   T-­‐Test	  Results	  for	  English	  &	  French	  Groups	  ..................................................	  40	  Table	  3.2	   Tokens	  Correctly/Incorrectly	  Identified	  .........................................................	  45	  	  	   	  	  	   vi	  List	  of	  Figures	  Figure	  1.1	   ISP%	  for	  Speakers	  .......................................................................................................	  .7	  Figure	  2.1	   Correct	  Tokens	  Overall	  for	  English	  &	  French	  Groups	  ................................	  28	  Figure	  2.2	   Overall	  %	  of	  Correct	  Tokens	  by	  Speaker	  &	  L1	  Language	  .........................	  29	  Figure	  3.1	   Accuracy	  %	  by	  Native	  Language	  .........................................................................	  40	  Figure	  3.2	   Confusion	  Matrix	  for	  Subjects	  Responses	  .......................................................	  42	  Figure	  3.3	   Mean	  Accuracy	  Rate	  by	  Stimulus	  Language	  ...................................................	  44	  	  	  	  	  	   vii	  Acknowledgements	  	  I	  am	  grateful	  to	  the	  faculty	  and	  staff	  in	  the	  department	  of	  Linguistics	  at	  the	  University	  of	  British	  Columbia	  who	  have	  provided	  me	  with	  the	  skills	  to	  work	  in	  this	  field.	  My	  supervisor	  Bryan	  Gick	  and	  committee	  members	  Eric	  Vatikiotis-­‐Bateson	  and	  Rose-­‐Marie	  Deschaine	  have	  been	  great	  mentors	  during	  my	  time	  at	  this	  institution.	  Thanks	  to	  my	  fellow	  students,	  researchers	  and	  research	  assistants	  in	  the	  Interdisciplinary	  Speech	  Research	  Lab.	  	  	  	  	  	  	  	   I	  would	  also	  like	  to	  give	  a	  special	  thanks	  to	  Kyle	  Danielson	  from	  UBC	  Psychology	  and	  Phoebe	  Wong	  from	  UBC	  Audiology	  for	  providing	  assistance	  with	  PsyScope,	  R,	  and	  statistics.	  	  	   1	  Chapter	  1	  	  Introduction	  	  Prior	  research	  suggests	  that	  adults	  can	  discriminate	  between	  different	  languages	  based	  solely	  on	  visual	  signals.	  Ronquest	  et	  al.	  (2010)	  explored	  visual	  cues	  that	  observers	  use	  to	  complete	  language-­‐identification	  tasks	  in	  Spanish	  and	  English	  and	  found	  that	  observers	  could	  detect	  rhythmic	  differences	  in	  syllable-­‐timed	  vs.	  stress-­‐timed	  stimuli.	  Studies	  by	  Weikum	  et	  al.	  (2007)	  and	  Weikum	  et	  al.	  (2013)	  found	  that	  there	  is	  a	  correlation	  between	  the	  age	  of	  acquisition	  and	  the	  ability	  to	  differentiate	  one’s	  own	  language	  from	  other	  languages	  using	  visual-­‐only	  cues.	  Weikum	  et	  al.	  (2007)	  examined	  infants	  ability	  to	  discriminate	  French	  from	  English	  using	  silent	  video	  clips.	  4	  month	  old	  native	  English	  speaking	  infants	  were	  studied	  to	  see	  if	  they	  could	  differentiate	  their	  native	  language	  (English)	  from	  an	  unfamiliar	  language	  (French).	  The	  4	  month	  old	  infants	  were	  also	  compared	  against	  6	  to	  8	  month	  old	  monolingual	  English	  and	  bilingual	  English-­‐French	  to	  determine	  how	  perception	  accuracy	  is	  affected	  by	  age.	  The	  stimuli	  used	  were	  sentences	  recited	  by	  three	  bilingual	  French-­‐English	  speakers	  in	  each	  language.	  Looking	  time	  was	  the	  method	  used	  in	  order	  to	  determine	  if	  infants	  were	  able	  to	  differentiate	  the	  languages.	  If	  an	  infant	  saw	  trials	  of	  different	  sentences	  from	  each	  language	  of	  the	  same	  speaker	  and	  their	  looking	  time	  increased,	  this	  indicated	  they	  had	  noticed	  the	  language	  change.	  The	  results	  from	  Weikum	  et	  al.	  (2007)	  showed	  that	  4	  and	  6	  month	  old	  monolingual	  infants	  looked	  significantly	  longer	  at	  language	  switch	  trials	  than	  the	  8	  month	  old	  monolingual	  infants.	  This	  finding	  supports	  that	  infants	  can	  visually	  identify	  their	  	  	   2	  native	  language	  from	  an	  unfamiliar	  language	  at	  4	  and	  6	  months,	  but	  not	  at	  8	  months.	  It	  was	  also	  found	  that	  of	  the	  8	  month	  old	  infants,	  only	  the	  bilingual	  ones	  looked	  significantly	  longer	  at	  the	  language	  switch.	  This	  means	  that	  after	  8	  months	  of	  age,	  infants	  ability	  to	  discriminate	  a	  familiar	  language	  vs.	  a	  non-­‐familiar	  declines.	  The	  fact	  that	  of	  the	  8	  month	  old	  infants,	  only	  the	  bilinguals	  who	  were	  familiar	  in	  both	  languages	  had	  the	  ability	  to	  discriminate	  French	  from	  English	  supports	  the	  previous	  statement.	  Weikum	  et	  al.	  (2013)	  expands	  on	  their	  Weikum	  et	  al.	  (2007)	  and	  explored	  how	  age	  of	  acquisition	  affects	  adults	  ability	  to	  discriminate	  English	  from	  French	  using	  visual-­‐only	  stimuli.	  They	  used	  video	  clips	  of	  three	  balanced	  French/English	  bilinguals	  reciting	  French	  and	  English	  sentences	  from	  the	  French	  and	  English	  versions	  of	  the	  book	  “The	  Little	  Prince”	  that	  ranged	  8	  to	  13	  seconds	  in	  length.	  Each	  of	  the	  clips	  had	  the	  sound	  removed	  after	  recording.	  Weikum	  et	  alʼs	  (2013)	  data	  showed	  that	  adults	  who	  had	  learned	  English	  as	  a	  first	  or	  second	  language	  between	  the	  ages	  of	  0	  to	  6	  were	  able	  to	  judge	  whether	  a	  speaker	  appearing	  in	  a	  silent	  video	  clip	  was	  French	  or	  English.	  Adults	  who	  had	  learned	  English	  after	  the	  age	  of	  6	  failed	  to	  discriminate	  between	  the	  two	  languages.	  None	  of	  the	  participants	  had	  any	  knowledge	  of	  French.	  	  The	  initial	  goal	  of	  this	  thesis	  was	  to	  replicate	  the	  study	  by	  Weikum	  et	  al.	  (2013)	  using	  inter-­‐speech	  and	  speech	  ready	  stimuli	  as	  opposed	  to	  full	  sentences.	  Inter-­‐speech	  posture	  (ISP)	  is	  typically	  described	  as	  the	  brief	  pauses	  that	  occur	  between	  speech	  when	  a	  speaker	  is	  in	  the	  act	  of	  an	  utterance;	  speech-­‐ready	  position	  is	  an	  articulatory	  posture	  that	  a	  speaker	  assumes	  when	  he/she	  prepares	  to	  speak	  (Ramanarayanan	  et	  al.	  (2013),	  Gick	  et	  al.	  (2004)).	  	  	  	   3	  	  	  1.1.	  Background	  on	  ISP	  This	  section	  will	  highlight	  some	  of	  the	  previous	  research	  and	  methods	  exploring	  ISP	  which	  employ	  the	  use	  of	  several	  technologies	  including:	  Ultrasound,	  X-­‐Ray	  Optotrak,	  EPG	  and	  MRI.	  Topics	  explored	  are	  the	  causes	  that	  contribute	  to	  ISPs	  and	  when	  they	  happen.	  How	  these	  properties	  differ	  across	  speakers	  in	  general	  are	  also	  explained,	  in	  addition	  to	  significant	  differences	  in	  bilinguals	  vs.	  monolinguals.	  The	  background	  is	  meant	  to	  give	  a	  history	  and	  describe	  the	  methods,	  theories	  and	  evidence	  surrounding	  ISP.	  	  1.1.1	  Ultrasound	  Studies	  of	  ISP:	  Gick	  et	  al.	  2004	  Gick	  et	  al.	  (2004)	  explored	  the	  existence	  of	  a	  ‘default	  setting’	  or	  ‘posture’	  for	  articulators	  and	  facial	  muscles	  when	  speaking	  a	  particular	  language.	  They	  manually	  measured	  x-­‐ray	  films	  for	  the	  following	  data:	  pharynx	  width,	  velic	  aperture,	  tongue	  body	  distance	  from	  the	  hard	  palate	  (tongue	  dorsum	  constriction	  degree),	  tongue	  tip	  distance	  from	  the	  alveolar	  ridge	  (or	  tongue	  tip	  constriction	  degree),	  lower-­‐to-­‐upper	  jaw	  distance,	  upper	  and	  lower	  lip	  protrusion.	  	  Another	  question	  raised	  was	  whether	  this	  default	  posture	  is	  specified	  as	  a	  language’s	  inventory	  which	  is	  learned	  from	  other	  speakers	  or	  functionally	  derived	  properties	  of	  speech	  motion.	  In	  other	  words,	  if	  there	  is	  a	  default	  posture	  it	  may	  be	  part	  of	  a	  language’s	  inventory	  or	  specified	  target	  that	  is	  held	  throughout	  the	  utterance	  while	  being	  available	  to	  learners	  uninterrupted	  through	  the	  acoustic	  	  	   4	  signal	  or	  it	  may	  be	  used	  as	  a	  rest	  position	  that	  is	  a	  feature	  or	  property	  of	  that	  language	  (language-­‐specific).	  If	  it	  is	  not	  part	  of	  the	  language	  inventory	  and	  is	  functional	  then	  this	  could	  have	  to	  do	  with	  motor	  control,	  for	  instance	  a	  speaker	  using	  a	  large	  amount	  of	  postvelar	  sounds	  could	  display	  more	  retraction	  in	  their	  articulators.	  	  	  	  1.1.2	  	  Ultrasound	  and	  Optotrak	  Studies	  of	  ISP:	  Wilson	  2006	  Wilson	  (2006)	  explores	  ISP	  in	  Canadian	  French	  and	  Canadian	  English	  speakers.	  Two	  experiments	  were	  carried	  out	  in	  Wilson	  (2006)	  using	  Optotrak	  and	  ultrasound	  imaging	  in	  order	  to	  address	  the	  question	  of	  whether	  ISP	  is	  language	  specific	  in	  both	  monolingual	  and	  bilingual	  speakers	  and	  if	  it	  is	  influenced	  by	  phonetic	  context	  and/or	  speech	  mode	  (monolingual	  or	  bilingual). The	  results	  from	  Wilson	  (2006)	  show	  significant	  differences	  among	  the	  two	  groups	  of	  speakers	  in	  regards	  to	  position	  of	  the	  articulators.	  In	  addition,	  Wilson’s	  (2006)	  data	  also	  lend	  support	  to	  the	  notion	  that	  there	  is	  no	  differentiation	  between	  ISP	  in	  bilingual-­‐mode,	  but	  favors	  the	  idea	  that	  a	  bilingual’s	  ISP	  is	  the	  same	  ISP	  of	  a	  speaker's	  currently	  most	  used	  language.	  	  	  1.1.3	  Ultrasound	  Studies	  of	  Bilingual-­‐Mode	  ISP	  Wilson	  &	  Gick	  (2013)	  define	  bilingual-­‐mode	  as	  what	  occurs	  when	  a	  bilingual	  speaks	  with	  another	  bilingual	  when	  both	  languages	  are	  being	  used,	  whereas	  monolingual-­‐mode	  is	  described	  as	  what	  occurs	  when	  a	  bilingual	  is	  speaking	  to	  a	  monolingual	  with	  only	  one	  language	  being	  used.	  Additionally,	  it	  is	  also	  possible	  to	  have	  two	  	  	   5	  bilinguals	  speaking	  to	  one	  another	  using	  only	  one	  language.	  Wilson	  &	  Gick	  (2013)	  further	  tested	  the	  question	  proposed	  in	  Wilson	  (2006)	  of	  whether	  ISP	  is	  language	  specific	  in	  both	  monolingual	  and	  bilingual	  speakers	  by	  having	  eight	  French-­‐English	  bilinguals	  read	  English	  and	  French	  sentences.	  The	  participants’	  ISPs	  were	  measured	  with	  optical	  tracking	  of	  the	  3D	  positions	  of	  the	  lips	  and	  jaw	  while	  ultrasound	  imaging	  was	  used	  to	  track	  tongue	  movements.	  The	  results	  from	  Wilson	  &	  Gick	  (2013)	  reinforce	  the	  hypothesis	  in	  Wilson	  (2006),	  namely	  that	  bilinguals	  use	  the	  ISP	  of	  their	  most-­‐dominant	  language	  when	  in	  bilingual-­‐mode.	  	  1.1.4	  Electropalatographic	  Studies	  of	  ISP:	  Schaeffler	  et	  al.	  2008	  Schaeffler	  et	  al.	  (2008)	  used	  electropalatographic	  (EPG)	  data	  to	  provide	  information	  on	  tongue-­‐palate	  contact	  patterns	  during	  speech	  and	  non-­‐speech	  activities	  such	  as	  swallowing	  and	  bracing.	  ISP	  data	  were	  taken	  from	  three	  different	  tasks	  performed	  by	  English	  speakers:	  1)	  a	  read-­‐speech	  task	  with	  single	  words	  presented	  on	  screen;	  2)	  a	  picture-­‐naming	  task	  with	  pictures	  presented	  on	  screen;	  3)	  a	  semi-­‐spontaneous	  map	  task	  that	  required	  speakers	  to	  describe	  a	  simple	  route	  to	  a	  listener.	  	  	   ISP	  data	  were	  recorded	  by	  identifying	  significant	  change	  in	  overall	  contact	  pattern	  from	  the	  time	  the	  prompt	  appeared	  to	  the	  acoustic	  onset	  of	  speech.	  ISP	  was	  noted	  when	  the	  transition	  from	  pre-­‐prompt	  (non-­‐speech)	  position	  and	  the	  first	  speech	  gesture	  was	  neither	  an	  interpolation	  nor	  a	  random	  movement.	  For	  the	  read-­‐speech	  and	  picture-­‐naming	  tasks,	  data	  was	  gathered	  after	  an	  audible	  beep	  that	  	  	   6	  lasted	  2,	  5	  or	  8	  seconds	  before	  the	  orthographic	  or	  picture	  prompt	  appeared,	  continually	  until	  after	  the	  end	  of	  the	  acoustic	  output.	  For	  the	  map	  task,	  data	  was	  gathered	  continuously.	  	  Schaeffler	  et	  al.	  (2008)	  mention	  that	  EPG	  works	  well	  for	  cases	  such	  as	  when	  speakers	  hold	  part	  of	  their	  tongue	  against	  the	  roof	  of	  their	  mouth	  during	  non-­‐speech,	  but	  other	  tools	  are	  needed	  such	  as	  ultrasound	  for	  a	  more	  detailed	  understanding.	  Schaeffler	  et	  al.	  (2008)	  asked	  whether	  ISPʼs	  occurring	  in	  spontaneous	  pauses	  happen	  more	  or	  less	  often	  than	  those	  occurring	  in	  the	  prompted	  pauses	  and	  how	  long	  a	  pause	  has	  to	  be	  to	  give	  rise	  to	  a	  measurable	  ISP.	  A	  map	  task	  was	  used	  in	  order	  to	  determine	  this.	  To	  classify	  ISPs	  Schaeffler	  et	  al.	  (2008)	  first	  identified	  whether	  any	  notable	  change	  occurred	  in	  the	  overall	  contact	  pattern	  between	  the	  onset	  of	  the	  prompt	  and	  the	  onset	  of	  the	  acoustic	  response.	  An	  ISP	  zone	  was	  categorized	  when	  the	  transition	  between	  the	  pre-­‐prompt	  (non-­‐speech)	  position	  and	  the	  first	  speech	  gesture	  was	  neither	  a	  mere	  interpolation	  nor	  random	  movement.	  An	  ISP	  was	  identified	  in	  this	  phase	  if	  the	  kinematic	  record	  indicated	  a	  motion	  towards	  some	  configuration	  1,	  followed	  by	  smooth	  movement	  away	  from	  it	  towards	  the	  first	  segment	  2,	  or	  a	  clear	  pause	  during	  a	  continuous	  motion.	  Schaeffler	  et	  al.	  (2008)	  explored	  how	  different	  conditions	  affect	  the	  formation	  and	  dynamic	  structure	  of	  the	  ISP.	  Of	  particular	  interest	  was	  whether	  ISPs	  occurring	  in	  spontaneous	  pauses	  occur	  more	  or	  less	  often	  than	  those	  occurring	  in	  prompted	  pauses,	  and	  how	  long	  a	  pause	  has	  to	  be	  in	  order	  to	  a	  have	  measurable	  ISP.	  	  The	  figure	  below	  is	  an	  illustration	  from	  Schaeffler	  et	  al.	  (2008)	  showing	  the	  percentage	  of	  ISPs	  for	  speakers	  in	  all	  three	  tasks.	  	  	  	   7	  	  Figure	  1.1	  -­‐%	  ISP	  for	  Speakers	  for	  the	  3	  Tasks.	  Illustration	  taken	  from	  Schaeffler	  et	  al.	  (2008).	  This	  shows	  the	  proportion	  of	  pauses	  with	  ISP	  zones	  to	  total	  number	  of	  inter-­‐speech	  pauses.	  86	  ISP	  zones	  were	  identified	  in	  140	  pause	  tokens	  (61.4%	  of	  tokens).	  ISP	  zones	  began	  approximately	  half	  a	  second	  after	  presentation	  of	  the	  prompt,	  454	  ms	  before	  the	  acoustic	  onset	  (s.d.	  275	  ms).	  There	  were	  significant	  differences	  between	  speakers	  (one-­‐way	  ANOVA,	  F(2,85)	  =	  6.892,	  p=.002;	  cf.	  Table	  1)	  but	  not	  between	  speech	  tasks	  (i.e.	  picture	  naming	  vs.	  word	  list)	  or	  following	  segmental	  context	  (i.e.	  alveolar	  vs.	  non-­‐alveolar	  vs.	  vocalic	  onset).	  Schaeffler	  et	  al.	  (2008)	  mention	  that	  between-­‐	  speaker	  differences	  in	  speech	  rate	  may	  have	  had	  an	  affect	  here,	  but	  believe	  suspect	  habitual	  differences	  between	  speakers	  were	  more	  likely	  to	  explain	  these	  results.	  	   Schaeffler	  et	  al.	  (2008)	  investigated	  the	  variability	  among	  speakers	  and	  found	  that	  one	  speaker	  least	  familiar	  to	  wearing	  an	  artificial	  palate	  kept	  his	  tongue	  pressed	  against	  his	  palate	  when	  waiting	  for	  each	  prompt	  and	  when	  pausing	  naturally	  in	  the	  map	  task	  while	  another	  speaker	  showed	  a	  rest	  position	  with	  little	  to	  no	  tongue-­‐palate	  contact	  which	  made	  the	  process	  of	  identifying	  ISPs	  for	  that	  speaker	  harder	  to	  detect.	  Schaeffler	  et	  al.	  (2008)	  	  were	  thus	  unable	  to	  estimate	  the	  difference	  between	  spontaneous	  vs.	  prompted	  speech	  on	  formation	  of	  ISPs	  and	  no	  clear	  conclusion	  could	  be	  drawn	  in	  terms	  of	  which	  task	  elicits	  reliable	  ISPs.	  	  	  	   8	  Wilson	  (2006)	  investigates	  whether	  ISP	  is	  language-­‐specific	  in	  both	  monolingual	  and	  bilingual	  speakers	  of	  Canadian	  English	  and	  Québécois	  French	  using	  Optotrak	  and	  ultrasound	  imaging.	  Two	  experiments	  tested	  how	  ISP	  is	  related	  to	  phonetic	  context	  and	  speech	  mode	  (bilingual	  or	  monolingual).	  Wilson	  (2006)	  show	  significant	  differences	  in	  ISP	  across	  English	  and	  French	  monolingual	  groups	  with	  English	  monolinguals	  having	  a	  higher	  tongue	  tip,	  more	  protruded	  upper	  and	  lower	  lips,	  and	  more	  narrow	  horizontal	  lip	  aperture.	  An	  ultrasound	  monitor	  was	  used	  to	  view	  tongue	  movements	  in	  real	  time	  along	  with	  an	  Optotrak	  (Northern	  Digital	  Inc.)	  3020	  optical	  tracking	  system	  that	  measured	  the	  3D	  positions	  of	  the	  lips,	  jaw	  and	  head	  relative	  to	  the	  ultrasound	  probe.	  Optotrak	  numeric	  data	  and	  ultrasound	  videos	  were	  the	  data	  sources	  measured	  which	  was	  processed	  via	  MATLAB,	  however	  data	  had	  to	  be	  pre-­‐processed	  first.	  The	  first	  step	  in	  this	  required	  the	  DV	  Ultrasound	  tape	  to	  be	  converted	  to	  Adobe	  Premiere	  movie	  files	  which	  were	  resampled	  in	  order	  to	  ensure	  later	  measurements	  were	  all	  on	  the	  same	  scale.	  Ultrasound	  movie	  files	  were	  then	  cropped	  so	  that	  the	  first	  frame	  for	  each	  file	  was	  the	  frame	  immediately	  following	  after	  a	  clapper	  was	  heard	  in	  order	  to	  establish	  which	  Optotrak	  frames	  corresponded	  to	  the	  ultrasound	  frames	  of	  interest.	  Possible	  periods	  of	  rest	  used	  for	  analysis	  were	  found	  by	  replaying	  the	  ultrasound	  movie	  files	  and	  searching	  after	  every	  sentence	  for	  a	  period	  of	  at	  least	  10	  frames	  of	  no	  tongue	  motion.	  The	  reasoning	  behind	  choosing	  a	  10-­‐frame	  period	  vs.	  a	  longer	  or	  shorter	  period,	  was	  that	  a	  10-­‐frame	  period	  was	  the	  longest	  possible	  rest	  period	  where	  the	  tongue	  was	  considered	  to	  be	  at	  rest	  in	  an	  average	  of	  about	  50%	  of	  the	  ISPs	  across	  all	  24	  subjects.	  If	  such	  a	  period	  of	  10	  frames	  of	  no	  tongue	  motion	  existed,	  then	  the	  	  	   9	  centre	  frame	  of	  that	  period	  was	  chosen	  as	  a	  "possible	  rest	  frame"	  for	  analysis,	  Wilson	  (2006).	  English	  speakers’	  jaw	  ISPs	  were	  found	  to	  be	  partially	  influenced	  by	  phonetic	  context,	  but	  the	  lip	  and	  tongue	  ISPs	  were	  not.	  Wilson	  (2006)	  found	  there	  was	  not	  a	  significant	  difference	  between	  French	  and	  English	  ISP	  measurements	  for	  the	  jaw	  and	  velum.	  Upper	  and	  lower	  lip	  protrusion	  were	  greater	  for	  English	  ISP	  than	  French	  ISP	  in	  bilinguals	  perceived	  as	  native	  speakers	  of	  both	  languages,	  but	  not	  bilinguals	  who	  weren’t	  perceived	  as	  native	  in	  both.	  The	  tongue	  tip,	  tongue	  body	  and	  tongue	  root	  were	  all	  farther	  away	  from	  the	  opposing	  vocal	  tract	  surface	  in	  the	  French	  group	  than	  in	  the	  English	  group.	  	  Wilson	  (2006)	  note	  that	  variation	  in	  anatomical	  size	  and	  proportion	  could	  be	  an	  explanation.	  Some	  of	  the	  other	  factors	  for	  this	  difference	  between	  the	  two	  language	  groups	  could	  also	  be	  thing	  such	  as:	  higher	  tongue	  tip	  in	  English	  due	  to	  sides	  of	  tongue	  being	  tethered	  to	  the	  roof	  of	  mouth	  and	  molars;	  French	  jaw	  being	  open	  more	  and	  more	  widely	  vs.	  English	  because	  of	  high	  frequency	  of	  [a]	  in	  French	  compared	  to	  English;	  English	  lips	  being	  neutral	  whereas	  French	  lips	  are	  more	  rounded	  and	  active	  in	  spreading	  and	  rounding.	  The	  factors	  accounting	  for	  the	  differences	  of	  articulators	  in	  French	  and	  English	  go	  beyond	  the	  scope	  of	  this	  paper,	  but	  they	  are	  worth	  noting.	  	  1.1.5	  MRI	  Studies	  of	  ISP	  Ramanarayanan	  et	  al.	  (2013)	  explore	  human	  speech	  production	  using	  real-­‐time	  magnetic	  resonance	  imaging	  (MRI)	  of	  the	  vocal	  tract.	  Ramanarayanan	  et	  al’s.	  (2013)	  procedure	  extracted	  frames	  correlating	  to	  ISP	  pauses,	  speech-­‐ready	  and	  absolute	  	  	   10	  rest	  position	  from	  MRI	  sequences	  of	  speech	  read	  by	  5	  English	  speakers.	  In	  addition	  the	  procedure	  extracted	  image	  features	  that	  were	  used	  to	  measure	  vocal	  tract	  posture	  at	  these	  time	  intervals.	  Their	  analysis	  determined	  that	  there	  are	  significant	  differences	  between	  vocal	  tract	  posture	  during	  ISP	  and	  absolute	  rest	  position	  before	  speech.	  The	  results	  from	  Ramanarayanan	  et	  al.	  (2013)	  lend	  further	  support	  to	  the	  idea	  that	  vocal	  tract	  positions	  differ	  during	  positions	  at	  rest,	  speech-­‐ready	  and	  ISP.	  Ramanarayanan	  et	  al.	  (2013)	  state	  that	  the	  default	  setting	  can	  be	  defined	  as	  the	  set	  of	  postural	  configurations	  that	  vocal	  tract	  articulators	  tend	  to	  be	  deployed	  from	  and	  return	  to	  in	  the	  process	  of	  producing	  fluent	  and	  natural	  speech;	  in	  addition	  they	  can	  be	  language-­‐specific	  or	  speaker-­‐specific.	  A	  postural	  configuration	  can	  vary,	  for	  example,	  keeping	  lips	  in	  a	  rounded	  position	  or	  keeping	  the	  tongue	  retracted	  into	  the	  pharynx	  throughout	  an	  entire	  speech	  utterance.	  	  Ramanarayanan	  et	  al.	  (2013)	  ask	  what	  articulatory	  or	  acoustic	  variables	  are	  used	  to	  obtain	  these	  postures?	  Prior	  to	  their	  paper,	  the manner of control used by the speech	  “planner”	  during	  the	  execution	  of	  these	  postures	  had	  not	  been	  addressed	  yet	  in	  a	  comprehensive	  manner	  using	  speech	  articulation	  data.	  Also,	  Ramanarayanan	  et	  al.	  (2013)	  focus	  on	  understanding	  speech	  within	  spoken	  American	  English	  while	  considering	  the	  effects	  of	  speaking	  style	  (read	  vs.	  spontaneous)	  and	  position	  within	  an	  utterance	  and	  analyzing	  its	  postural	  motor	  control	  characteristics.	  Postures	  occurring	  in	  silent	  pauses	  before	  speech	  (speech-­‐ready	  and	  absolute	  rest)	  and	  during	  speech	  are	  examined	  in	  order	  to	  eliminate	  most	  of	  the	  factors	  that	  may	  happen	  due	  to	  articulatory	  postural	  variations	  that	  are	  required	  to	  produce	  speech	  sounds.	  For	  example,	  one	  speaker	  may	  display	  a	  wider	  	  	   11	  opening	  of	  the	  mouth	  to	  produce	  /r/	  sound	  while	  another	  speaker	  may	  have	  a	  much	  narrower	  opening	  when	  producing	  that	  same	  sound,	  or	  employ	  the	  use	  of	  a	  different	  muscle	  or	  movement.	  	  	  As	  previously	  mentioned,	  Gick	  et	  al.	  (2004)	  claimed	  the	  existence	  of	  a	  language-­‐specific	  default	  position	  and	  also	  that	  speech	  rest	  positions	  are	  specified	  in	  a	  manner	  similar	  to	  actual	  speech	  targets.	  Ramanarayanan	  et	  al.	  (2013)	  mention	  that	  further	  analysis	  of	  positioning	  within	  an	  utterance	  and	  speaking	  style	  could	  have	  important	  implications	  for	  understanding	  the	  speech	  motor	  planning	  process.	  Gick	  et	  al.	  (2013)	  analyze	  vocal	  tract	  posture	  in	  order	  to	  answer	  the	  following	  questions:	  (1)	  Do	  articulatory	  postures	  occurring	  in	  grammatical	  ISP	  pauses	  differ	  from	  those	  in	  an	  absolute	  rest	  position	  and	  also	  from	  speech-­‐ready	  posture	  (or	  pre-­‐speech	  posture)?,	  (2)	  What	  can	  be	  concluded	  regarding	  the	  degree	  of	  active	  control	  exerted	  by	  the	  cognitive	  speech	  planner	  (as	  measured	  by	  the	  variance	  of	  appropriate	  variables	  that	  capture	  vocal	  tract	  posture)	  in	  each	  case?,	  and	  (3)	  Do	  articulators	  vary	  between	  read	  and	  spontaneous	  speech?	  The	  question	  regarding	  read	  vs.	  spontaneous	  speech	  builds	  upon	  a	  previous	  study	  by	  Ramanarayanan	  et	  al.	  (2009)	  that	  explored	  the	  hypotheses	  that	  pauses	  at	  major	  syntactic	  boundaries	  (i.e.,	  grammatical	  pauses),	  but	  not	  ungrammatical	  (e.g.,	  word	  search)	  pauses,	  are	  planned	  by	  a	  high-­‐level	  cognitive	  mechanism	  while	  also	  controlling	  the	  rate	  of	  articulation	  around	  these	  areas.	  In	  that	  study	  MRI	  was	  used	  to	  measure	  articulation	  at	  and	  around	  grammatical	  and	  ungrammatical	  pauses	  in	  spontaneous	  speech.	  Ramanarayanan	  et	  al.	  (2009)	  found	  that	  grammatical	  pauses	  were	  found	  to	  have	  an	  appreciable	  drop	  in	  speed	  at	  the	  pause	  itself	  vs.	  ungrammatical	  pauses,	  which	  	  	   12	  supported	  their	  hypothesis	  that	  grammatical	  pauses	  are	  indeed	  choreographed	  by	  a	  central	  cognitive	  planner.	  Since	  it	  was	  shown	  that	  different	  speaking	  styles	  can	  affect	  the	  articulators,	  Ramanarayanan	  et	  al.	  (2013)	  explored	  read	  vs.	  spontaneous	  speech	  further.	  Ramanarayanan	  et	  al.	  (2013)	  point	  out	  that	  no	  imaging	  technique	  can	  give	  a	  complete	  view	  of	  all	  vocal	  tract	  articulators,	  which	  can	  make	  the	  analysis	  of	  vocal	  tract	  posture	  difficult.	  There	  have	  been	  developments	  in	  real-­‐time	  MRI	  that	  can	  examine	  the	  midsagittal	  vocal	  tract	  during	  speech	  production	  which	  provides	  a	  way	  to	  measure	  the	  articulators.	  Ramanarayanan	  et	  al.	  (2013)	  used	  American	  English	  speakers	  reading	  a	  simple	  dialog	  in	  a	  conversation	  with	  the	  experimenter	  such	  as	  “what	  music	  do	  you	  listen	  to...,”	  “tell	  me	  more	  about	  your	  favorite	  cuisine...,”	  etc.)	  in	  order	  to	  elicit	  spontaneous	  spoken	  responses	  while	  inside	  an	  MRI	  scanner.	  One	  result	  found	  in	  Ramanarayanan	  et	  al.	  (2013)	  was	  that	  vocal	  tract	  postures	  occurring	  in	  absolute	  rest	  positions	  are	  more	  extreme	  and	  significantly	  different	  from	  those	  occurring	  in	  ISPs.	  Specifically,	  Ramanarayanan	  et	  al.	  (2013)	  found	  values	  of	  several	  variables	  (not	  velic	  aperture)	  during	  both	  read	  and	  spontaneous	  ISPs	  to	  be	  significantly	  higher	  than	  those	  during	  non-­‐speech	  rest	  intervals	  which	  suggests	  a	  more	  closed	  vocal	  tract	  position	  with	  a	  smaller	  jaw	  angle	  and	  a	  narrow	  pharynx	  at	  absolute	  rest	  compared	  to	  articulatory	  settings	  occurring	  just	  before	  speech	  (speech-­‐ready)	  and	  during	  speech	  (ISPs).	  Ramanarayanan	  et	  al.	  (2013)	  argue	  that	  this	  may	  indicate	  that	  during	  the	  instances	  of	  non-­‐speech	  rest	  position	  the	  tongue	  may	  be	  resting	  more	  nestled	  in	  the	  pharynx	  of	  the	  individual	  and	  that	  the	  mouth	  is	  more	  closed.	  	  Additionally,	  Ramanarayanan	  et	  al.	  (2013)	  found	  that	  rest	  positions	  	  	   13	  displayed	  relatively	  high	  differences	  compared	  to	  speech-­‐ready	  and	  ISP	  positions.	  The	  trend	  was	  typically	  seen	  for	  the	  read	  ISPs.	  Ramanarayanan	  et	  al’s.	  (2013)	  methodology	  is	  fairly	  robust	  to	  rotation	  and	  translation	  and	  does	  not	  require	  much	  manual	  intervention	  while	  also	  giving	  a	  meaningful	  comparison	  across	  speakers.	  The	  results	  from	  Ramanarayanan	  et	  al.	  (2013)	  using	  real-­‐time	  MRI	  measurements	  of	  vocal	  tract	  posture	  show	  that	  (1)	  there	  is	  a	  significant	  difference	  in	  default	  rest	  postures	  compared	  to	  speech-­‐ready	  and	  inter-­‐speech	  pause	  postures,	  (2)	  there	  is	  a	  significant	  trend	  in	  most	  cases	  for	  variance	  between	  ISP	  pauses,	  which	  appear	  to	  be	  more	  controlled	  in	  their	  execution	  vs.	  rest	  and	  speech-­‐ready	  postures,	  and	  (3)	  read	  and	  spontaneous	  speaking	  styles	  also	  exhibit	  differences	  in	  articulatory	  postures.	  	  1.2.	  Visual	  Speech	  Information	  From	  Facial	  Movements	  There	  have	  been	  many	  works	  including	  Ronquest	  et	  al.	  (2010)	  and	  Soto-­‐Faraco	  et	  al.	  (2007)	  investigating	  perception	  of	  visual-­‐only	  speech	  also	  referred	  to	  as	  “lipreading”	  or	  “speech	  reading”.	  	  1.2.1	  The	  Contribution	  of	  Visual	  Speech	  Information	  	  Earlier	  work	  on	  visual-­‐only	  speech	  has	  also	  shown	  that	  speech	  perception	  is	  multimodal	  and	  that	  visual	  signals	  can	  both	  enhance	  and	  alter	  it.	  	  Several prior studies including Soto-Faraco et al. (2007), Munhall & Vatkiotis-Bateson (1998) and Sumby & Pollack (1954), suggest that visual speech information can aid in understanding spoken messages in 1) noisy conditions, 2) second languages and 3) instances where it is conceptually difficult to understand. When listening to speech 	  	   14	  in noisy conditions the face can provide extra information that increases perceptual accuracy, Sumby & Pollack (1954).	  Munhall & Vatikiotis-Bateson (1998) note that each individual speaker differs in the amount and clarity of phonetic information they provide while different speaking styles can also affect how the visual signal influences judgment. A typical problem throughout much of the past audio-visual studies is a lack of information regarding the visual stimuli other than the gender of the speaker. Soto-Faraco et al. (2007) asked the question of how much information can be revealed from visual-only speech signals and expanded upon prior research showing visual-only signals can relay information to perceivers.	  Listeners	  modify	  the	  use	  of	  visual	  information	  depending	  on	  the	  recording	  conditions	  (Vatikiotis-­‐Bateson	  et	  al.	  (2007).	  Ronquest	  et	  al.	  (2010)	  replicate	  and	  expanded	  further	  upon	  Soto-­‐Faraco	  et	  al.	  (2007)	  using	  two	  languages	  that	  differ	  in	  rhythmic	  classification	  and	  timing	  in	  order	  to	  examine	  the	  contribution	  of	  rhythmic	  information	  in	  visual-­‐only	  language	  processing.	  Ronquest	  et	  al.	  (2010)	  found	  that	  both	  the	  monolingual	  and	  the	  bilingual	  observers	  completed	  the	  task	  successfully	  which	  further	  supported	  earlier	  results	  of	  Soto-­‐Faraco	  et	  al.	  (2007).	  	  	  1.2.2	  Language	  Identification	  From	  Visual-­‐Only	  Cues	  Ronquest	  et	  al.	  (2010)	  mention	  how	  a	  significant	  amount	  of	  research	  has	  demonstrated	  that	  rhythmic	  information	  can	  be	  identified	  in	  auditory	  speech	  and	  that	  listeners	  can	  distinguish	  between	  languages	  in	  the	  absence	  of	  lexical	  or	  segmental	  information	  by	  relying	  solely	  on	  linguistic	  rhythm	  and	  durational	  cues.	  It	  	  	   15	  has	  also	  been	  shown	  that	  in	  terms	  of	  rhythm,	  stress-­‐timed	  languages	  such	  as	  English	  have	  a	  higher	  variation	  in	  vowel	  duration	  than	  syllable-­‐timed	  languages	  such	  as	  Spanish	  which	  has	  considerably	  less	  vowel	  duration.	  One	  experiment	  in	  Ronquest	  et	  al.	  (2010)	  tested	  whether	  observers	  could	  differentiate	  rhythmic	  differences	  of	  syllable-­‐timed	  vs.	  stress-­‐timed	  stimuli	  by	  examining	  visual-­‐only	  cues	  in	  language	  identification	  tasks.	  The	  experiment	  used	  monolingual	  and	  bilingual	  Spanish–English	  participants	  using	  a	  two-­‐alternative	  forced	  choice	  (2AFC)	  task.	  The	  stimuli	  used	  in	  this	  experiment	  consisted	  of	  visual-­‐only	  video	  clips	  of	  English	  and	  Spanish	  sentences	  spoken	  by	  male	  and	  female	  bilingual	  speakers.	  They	  also	  sought	  to	  examine	  additional	  cues	  available	  to	  observers	  for	  language	  identification.	  Another	  experiment	  in	  Ronquest	  et	  al.	  (2010)	  focused	  on	  the	  use	  of	  rhythmic	  cues	  in	  language	  identification	  which	  explored	  rhythmic	  differences	  and	  contributions	  to	  the	  perception	  of	  visual-­‐only	  signals. Their	  results	  show	  that	  language	  identification	  can	  occur	  from	  visual	  signals	  alone	  and	  that	  observers	  are	  able	  to	  identify	  some	  lexical	  items	  from	  a	  visual-­‐only	  display,	  but	  that	  the	  amount	  of	  available	  lexical	  information	  in	  this	  modality	  is	  very	  limited.	  Some	  specific	  words	  such	  as	  common	  lexical	  items	  and	  phrases	  were	  identified	  more	  accurately	  vs.	  less	  common	  ones,	  but	  overall	  the	  percentage	  of	  correct	  word	  identification	  was	  low.	  Their	  results	  also	  showed	  that	  observers	  were	  able	  to	  identify	  languages	  based	  on	  their	  rhythmic	  differences	  in	  visual-­‐only	  stimuli.	  	  Additionally,	  observers	  were	  able	  to	  identify	  stimuli	  that	  were	  temporally	  reversed	  which	  Ronquest	  et	  al.	  (2010)	  argue,	  eliminated	  lexical	  information	  but	  retained	  rhythmic	  differences,	  however	  this	  is	  debatable	  and	  certain	  scholars	  claim	  	  	   16	  that	  simple	  reversal	  of	  syllables	  can	  shift	  timing	  attributes.	  All	  of	  the	  participants	  in	  Ronquest	  et	  al.	  (2010)	  performed	  significantly	  above	  chance	  in	  language	  identification	  tasks	  both	  in	  forward	  and	  reversed	  conditions	  despite	  language	  background	  or	  prior	  linguistic	  experience.	  Since	  observers	  were	  able	  to	  identify	  words	  in	  the	  backward	  condition,	  this	  supports	  the	  idea	  that	  rhythmic	  differences	  are	  a	  cue	  that	  aids	  in	  language	  identification	  and	  that	  vowel	  duration	  and	  rhythmic	  differences	  among	  languages	  can	  affect	  how	  languages	  are	  perceived	  and	  identified	  in	  visual-­‐only	  speech. Results	  from	  Ronquest	  et	  al.	  (2010)	  also	  support	  the	  idea	  that	  the	  visual	  signal	  by	  itself	  is	  sufficient	  for	  an	  observer	  to	  correctly	  identify	  the	  language	  being	  spoken,	  expanding	  upon	  previous	  research	  confirming	  that	  prior	  linguistic	  experience,	  lexical	  information,	  rhythmic	  structure,	  and	  utterance	  length	  can	  play	  a	  role	  in	  visual-­‐only	  language	  identification.	    1.2.3	  The	  Bilingual	  Advantage	  in	  Language	  Identification	  of	  Visual	  Stimuli	  Soto-­‐Faraco	  et	  al.	  (2007)	  did	  a	  similar	  study	  to	  Ronquest	  et	  al.	  (2010)	  using	  Spanish	  and	  Catalan	  which	  are	  more	  similar	  to	  each	  other	  than	  Spanish	  and	  English.	  Soto-­‐Faraco	  et	  al.	  (2007)	  suggested	  future	  studies	  should	  examine	  observers’	  ability	  to	  discriminate	  or	  identify	  languages	  that	  are	  less	  closely	  related	  than	  Spanish	  and	  Catalan.	  Soto-­‐Faraco	  et	  al. (2007) examined whether monolingual and bilingual observers could discriminate Spanish from Catalan using visual-only speech stimuli.	  Soto-­‐Faraco	  et	  al. (2007)	  used	  two	  groups	  of	  bilinguals	  (Spanish	  dominant,	  Catalan	  dominant)	  and	  three	  groups	  of monolinguals (Spanish, Italian, English) that participated in language identification tasks. They found that the bilingual 	  	   17	  observers discriminated the languages better than the monolingual Spanish observers who still performed above chance and that the English and Italian monolingual observers who had no experience with either language were not successful at the task. This implies that knowledge of at least one of the languages is necessary in order to accurately discriminate visual-only stimuli.  Soto-Faraco et al. (2007) concluded that prior experience with the specific languages or at least one of them is one primary factor aiding in successful discrimination. Soto-Faraco et al. (2007) also mentioned that several different features of the stimuli affected discrimination, including length of the utterance and the number of distinctive segments or words present in the stimuli.   1.2.4	  Increased	  Cognitive	  Load	  of	  Visual-­‐Only	  Speech	   	  de	  los	  Reyes	  Rodríguez	  Ortiz	  (2008)	  mentions	  that	  processing	  visual-­‐only	  speech	  signals	  has	  a	  higher	  demand	  mentally	  as	  opposed	  to	  perceiving	  oral	  speech	  signals.	  Speech	  reading	  requires	  a	  certain	  level	  of	  skill	  in	  deduction	  because	  one	  must	  have	  the	  ability	  to	  finish	  what	  one	  cannot	  hear	  from	  oral	  information.	  Another	  consideration	  discussed	  in	  de	  los	  Reyes	  Rodríguez	  Ortiz	  (2008)	  is	  the	  correlation	  between	  level	  of	  intelligence	  and	  speech	  reading	  accuracy	  that	  has	  raised	  certain	  questions	  by	  scholars.	  de	  los	  Reyes	  Rodríguez	  Ortiz	  (2008)	  poses	  the	  example	  argued	  that	  when	  a	  person	  has	  an	  IQ	  below	  80	  they	  will	  have	  certain	  difficulties	  in	  processing	  visual-­‐only	  speech,	  however,	  this	  issue	  is	  still	  under	  heavy	  debate.	  de	  los	  Reyes	  Rodríguez	  Ortiz	  (2008)	  explains	  how	  memory	  is	  also	  considered	  to	  be	  related	  	  	   18	  to	  accurately	  interpreting	  visual-­‐only	  speech	  since	  high	  accuracy	  levels	  of	  speech	  reading	  generally	  occur	  when	  the	  perceiver	  has	  a	  high	  level	  of	  working	  memory,	  and	  	  	  show	  that	  among	  prelingually	  deaf	  people,	  the	  best	  speech	  readers	  were	  those	  that	  possessed	  higher	  levels	  of	  intelligence	  and	  more	  intelligible	  speech.	  While	  there	  may	  be	  a	  correlation	  between	  a	  participant’s	  IQ	  and	  their	  performance	  in	  laboratory	  conditions,	  interpretation	  of	  the	  judgments	  are	  hard	  to	  measure.	  	  1.3.	  Linguistic	  Information	  From	  Facial	  Movements	  &	  Facial	  Recognition	  Campbell	  &	  Massaro	  (1997)	  investigate	  how	  a	  speaker’s	  face	  shows	  linguistic	  information	  in	  face-­‐to-­‐face	  interactions.	  In	  their	  study	  it	  is	  mentioned	  how	  prior	  studies	  have	  shown	  that	  participants	  with	  normal	  hearing	  are	  able	  to	  speech	  read	  without	  regular	  training	  and	  that	  both	  normal	  hearing	  and	  hearing-­‐impaired	  individuals	  can	  be	  trained	  to	  recognize	  visible	  consonant	  and	  vowel	  phonemes	  or	  ‘visemes’.	  Campbell	  &	  Massaro	  (1997)	  ask	  which	  features	  actually	  convey	  the	  information	  in	  the	  face	  required	  for	  speech	  reading.	  Studies	  such	  as	  Munhall	  &	  Vatikiotis-­‐Bateson	  (1998),	  Munhall	  et	  al.	  (2004)	  and	  Vatkiotis-­‐Bateson	  et	  al.	  (2007)	  have	  shown	  the	  features	  in	  the	  lower	  half	  of	  the	  face	  including	  the	  jaws,	  lips	  and	  cheeks	  are	  what	  conveys	  information	  to	  process	  speech	  reading.	  Campbell	  &	  Massaro	  (1997)	  mention	  prior	  studies	  such	  as	  Summerfield	  (1979)	  where	  subjects	  had	  to	  read	  speech	  in	  conditions:	  1)	  whole	  face	  displayed,	  2)	  videos	  where	  only	  the	  lips	  were	  shown	  and	  3)	  a	  moving	  ring	  representation	  of	  the	  lips.	  Campbell	  &	  Massaro	  (1997)	  explain	  the	  results	  found	  in	  Summerfield	  (1979)	  that	  identification	  accuracy	  increased	  as	  much	  as	  42.6%	  in	  videos	  where	  the	  whole	  face	  was	  displayed	  	  	   19	  vs.	  videos	  where	  only	  the	  lips	  or	  the	  moving	  ring	  were	  shown.	  Campbell	  &	  Massaro	  (1997)	  describe	  how	  faces	  are	  thought	  to	  be	  perceived	  as	  both	  individual	  features	  and	  structural	  relations	  among	  features.	  Second	  order	  structural	  relations	  are	  described	  as	  those	  that	  remain	  constant	  in	  all	  stimuli	  where	  first	  order	  ones	  that	  do	  not.	  For	  example,	  they	  regard	  the	  nose,	  mouth	  and	  eyes	  as	  second-­‐order,	  but	  the	  jaw	  as	  first-­‐order.	  Campbell	  &	  Massaro	  (1997)	  describe	  how	  different	  structural	  order	  relations	  have	  been	  thought	  to	  be	  visually	  identified	  independently	  of	  one	  another	  and	  that	  prior	  scholars	  have	  hypothesized	  that	  facial	  recognition	  is	  processed	  similarly	  to	  visual	  speech	  recognition.	  However,	  it	  is	  difficult	  not	  to	  point	  out	  the	  problems	  mentioned	  in	  Campbell	  &	  Massaro	  (1997)	  particularly	  in	  regards	  to	  structural	  relations	  because	  the	  mouth,	  nose	  and	  eyes	  can	  move	  and	  there	  are	  muscle	  components	  for	  these	  body	  parts	  as	  there	  are	  for	  body	  parts	  classified	  as	  first	  order	  structural	  relations	  (Vatikiotis-­‐Bateson	  et	  al.	  2007).	  	  Conrey	  &	  Gold	  (2006)	  discuss	  how	  normally	  hearing	  perceivers	  generally	  are	  able	  to	  understand	  visual-­‐only	  speech	  or	  ‘lipreading/speech	  reading’	  but	  speakers	  vary	  in	  how	  easy	  they	  are	  to	  understand.	  Despite	  the	  speaker	  variability	  of	  information	  given	  during	  visual-­‐only	  speech	  there	  are	  strategies	  used	  by	  participants	  that	  also	  have	  an	  affect	  on	  what	  information	  they	  can	  receive.	  The	  accuracy	  of	  visual-­‐only	  speech	  perception,	  also	  known	  as	  “speech	  intelligibility”,	  has	  been	  shown	  to	  vary	  across	  different	  speakers	  and	  some	  speakers	  are	  consistently	  easier	  than	  others	  to	  speech	  read.	  	  Conrey	  &	  Gold	  (2006)	  note	  that	  more	  studies	  in	  variability	  of	  auditory	  speech	  than	  visual-­‐only	  speech	  and	  how	  it	  has	  been	  shown	  that	  auditory	  speaker	  variability	  	  	   20	  can	  lower	  speech	  intelligibility.	  One	  such	  example	  given	  by	  Conrey	  &	  Gold	  (2006)	  is	  when	  a	  list	  of	  words	  is	  read	  by	  different	  speakers	  which	  contributes	  lower	  memory	  recognition	  in	  an	  instance	  when	  a	  previously	  heard	  word	  is	  presented	  in	  a	  novel	  voice	  as	  opposed	  to	  that	  of	  the	  original	  speaker.	  Even	  though	  both	  speakers	  and	  observers	  can	  vary	  on	  how	  intelligible	  and	  accurate	  their	  speech	  and	  perception	  is,	  it	  has	  been	  found	  that	  visual	  intelligibility	  of	  speakers	  tends	  to	  remain	  consistent	  across	  different	  observers.	  The	  question	  that	  arises	  is	  whether	  it	  is	  the	  properties	  of	  the	  speaker	  or	  the	  overall	  perceptual	  strategies	  across	  the	  observers.	  An	  example	  of	  this	  is	  given	  in	  Conrey	  &	  Gold	  (2006),	  speaker	  A	  who	  only	  moves	  his/her	  lips	  is	  known	  to	  be	  less	  intelligible	  than	  speaker	  B	  who	  moves	  his/her	  lips	  similar	  to	  speaker	  A	  but	  also	  uses	  additional	  jaw	  movements.	  It	  is	  not	  known	  if	  observers	  are	  getting	  more	  information	  from	  speaker	  B’s	  jaw	  movements	  or	  if	  the	  observers	  are	  using	  some	  other	  cues	  from	  speaker	  B	  that	  give	  a	  higher	  accuracy	  level.	  It	  is	  possible	  that	  the	  observer	  only	  looks	  at	  jaw	  movements	  and	  as	  a	  result	  sees	  speaker	  B	  as	  more	  clear	  since	  speaker	  A	  has	  no	  jaw	  movements,	  but	  if	  observers	  looked	  at	  the	  lip	  movement	  maybe	  they	  would	  be	  able	  to	  read	  both	  speakers	  at	  an	  equal	  accuracy	  level.	  One	  possible	  explanation	  could	  be	  that	  it	  is	  due	  to	  an	  individual	  speaker’s	  properties	  or	  speech	  habits	  that	  have	  an	  effect	  on	  an	  observer’s	  perceptual	  strategy.	  One	  example	  of	  this	  is	  given	  in	  Lansing	  and	  McConkie	  (2003)	  who	  reported	  observers	  focusing	  their	  gaze	  more	  on	  a	  speaker’s	  mouth	  when	  visual-­‐only	  speech	  was	  presented	  vs.	  visual	  and	  auditory	  speech	  presented	  together.	  If	  this	  is	  true	  then	  this	  would	  support	  the	  argument	  that	  it	  is	  the	  speakers’	  properties	  that	  ultimately	  affect	  perception	  and	  not	  the	  perceptual	  strategies	  of	  the	  observers.	  	  	   21	  	  Munhall	  et	  al.	  (2004)	  demonstrated	  how	  head	  movement	  plays	  a	  role	  in	  speech	  intelligibility	  employing	  the	  use	  of	  a	  custom	  animation	  system	  to	  create	  four	  different	  audiovisual	  versions	  of	  20	  Japanese	  sentences:	  1)	  recorded	  natural	  head	  motion	  &	  recorded	  facial	  motion;	  2)	  zero	  head	  motion	  &	  recorded	  facial	  motion;	  3)	  double	  head	  motion	  (amplitude	  of	  head	  movement	  doubled	  in	  all	  six	  degrees	  of	  freedom)	  &	  recorded	  facial	  motion;	  and	  4)	  auditory-­‐only	  video	  with	  the	  screen	  blacked	  out.	  Best	  performance	  was	  achieved	  when	  participants	  identified	  sentences	  with	  natural	  head	  &	  facial	  motion	  continuing	  from	  best	  to	  worst	  in	  the	  following	  order:	  zero	  head	  motion	  &	  recorded	  facial	  motion,	  double	  head	  motion	  &	  recorded	  facial	  motion,	  auditory-­‐only.	  This	  makes	  sense	  that	  the	  auditory-­‐only	  condition	  displayed	  the	  worse	  performance	  based	  on	  many	  previous	  investigations	  showing	  that	  by	  removing	  one	  of	  the	  signals	  decreases	  intelligibility,	  but	  the	  head	  movement	  factor	  is	  quite	  interesting	  here	  and	  should	  be	  noted	  for	  future	  investigations	  relating	  to	  this	  current	  study.	  Conrey	  &	  Gold	  (2006)	  discuss	  how	  measurements	  of	  lip	  opening	  and	  vowel	  duration	  have	  been	  found	  to	  be	  generally	  good	  judges	  of	  perceptual	  cues	  for	  identifying	  vowels	  among	  some	  speakers,	  but	  this	  also	  varies	  across	  speakers.	  Furthermore,	  perceptual	  distance	  from	  consonant	  phonemes	  has	  been	  measured	  by	  using	  multidimensional	  scaling	  (MDS)	  analysis	  and	  has	  shown	  that	  speakers	  who	  were	  more	  intelligible	  had	  greater	  correlations	  between	  distance	  from	  phonemes	  on	  the	  MDS	  analysis.	  However,	  one	  problem	  in	  Conrey	  &	  Gold	  (2006)	  is	  that	  participants	  looked	  at	  visual	  stimuli	  that	  included	  the	  markers	  used	  on	  the	  talkers	  so	  this	  may	  have	  had	  an	  effect	  on	  observers’	  perception.	  Separating	  the	  two	  factors	  	  	   22	  of	  physical	  variability	  in	  the	  visual	  information	  and	  the	  perceptual	  strategies	  among	  observers	  described	  in	  Conrey	  &	  Gold	  (2006)	  can	  be	  done	  with	  a	  technique	  called	  ideal	  observer	  analysis,	  Geisler	  (2004).	  It	  can	  be	  used	  to	  give	  the	  amount	  of	  physical	  information	  available	  in	  a	  perception	  task	  by	  looking	  at	  observers	  who	  produce	  the	  best	  possible	  performance	  on	  a	  certain	  perceptual	  task.	  The	  goal	  of	  Conrey	  &	  Gold’s	  (2006)	  study	  was	  to	  test	  whether	  cross-­‐speaker	  variability	  in	  visual-­‐only	  speech	  perception	  happens	  because	  of	  1)	  differences	  in	  information	  available	  across	  talkers	  or	  2)	  different	  perceptual	  strategies	  among	  observers.	  Conrey	  &	  Gold	  (2006)	  established	  that	  talker	  variability	  in	  visual-­‐only	  speech	  perception	  happens	  because	  of	  both	  variability	  of	  physical	  information	  and	  the	  perceptual	  strategies	  of	  observers.	  	  1.4	  Goal	  &	  Questions	  of	  This	  Study	  I	  hope	  in	  this	  thesis	  to	  answer	  the	  question	  of	  whether	  utterances	  occurring	  in	  what	  is	  defined	  as	  ISP	  and	  speech-­‐ready	  position	  can	  be	  identified	  visually	  using	  similar	  methods	  as	  those	  described	  in	  Weikum	  et	  al.	  (2013)	  and	  Soto-­‐Faraco	  et	  al.	  (2007),	  namely	  whether	  participants	  can	  discriminate	  between	  different	  languages	  from	  silent	  video	  clips.	  Are	  we	  able	  to	  discriminate	  between	  two	  different	  languages	  simply	  by	  looking	  at	  the	  visible	  articulatory	  positioning	  of	  one’s	  face	  when	  the	  speaker	  is	  in	  ISP	  and/or	  speech-­‐ready	  position?	  Are	  perceivers	  able	  to	  better	  identify	  ISP	  and	  speech-­‐ready	  tokens	  in	  the	  language(s)	  they	  speak	  vs.	  the	  language(s)	  they	  do	  not?	  For	  instance,	  would	  a	  native	  English	  speaker	  watching	  a	  	  	   23	  silent	  video	  of	  a	  person	  speaking	  be	  able	  to	  better	  judge	  whether	  the	  speaker	  in	  the	  video	  is	  speaking	  English	  vs.	  a	  different	  language	  unfamiliar	  to	  the	  observer?	  	  Based	  on	  Weikum	  et	  alʼs	  (2013)	  results,	  if	  it	  is	  assumed	  ISP	  and	  speech-­‐ready	  tokens	  are	  perceived	  in	  a	  similar	  way	  as	  full	  sentences	  it	  is	  expected	  that	  observers	  are	  able	  to	  differentiate	  two	  different	  languages	  if	  they	  have	  familiarity	  with	  at	  least	  one	  of	  them?	  If	  ISP	  and	  speech-­‐ready	  position	  are	  language	  specific,	  this	  postural	  information	  should	  be	  more	  available	  to	  perceivers	  who	  are	  familiar	  with	  the	  target	  language(s)	  in	  question.	  A	  pilot	  study	  was	  conducted	  in	  order	  to	  test	  the	  hypothesis	  that	  perception	  of	  ISP	  is	  more	  robust	  when	  perceivers	  are	  familiar	  with	  the	  target	  language	  in	  Chapter	  2	  followed	  by	  a	  more	  elaborate	  study	  of	  the	  same	  question	  in	  Chapter	  3.	  	  	  	  	  	  	  	   	  	  	   24	  Chapter	  2	  	  Native/Non-­‐Native	  Perceivers	  Pilot	  	  An	  initial	  pilot	  study	  was	  conducted	  wherein	  participants	  with	  or	  without	  previous	  knowledge	  of	  English	  and	  French	  observed	  short,	  silent	  video	  clips	  of	  a	  bilingual	  French-­‐English	  speaker’s	  face	  during	  pre-­‐speech	  and	  inter-­‐speech	  postures.	  	  2.1.	  Methods	  6	  participants	  from	  two	  different	  language	  groups	  participated	  in	  this	  study	  (3	  native	  English	  speakers	  with	  non-­‐native	  exposure	  to	  French,	  and	  3	  non-­‐native	  English	  speakers	  with	  no	  previous	  knowledge	  of	  French).	  All	  3	  of	  the	  native	  English	  speakers	  were	  female	  between	  the	  ages	  of	  26-­‐38	  and	  from	  the	  US	  or	  Canada,	  while	  the	  non-­‐native	  English	  speakers	  were	  2	  males	  and	  1	  female	  between	  the	  ages	  of	  24-­‐43.	  The	  female	  and	  one	  of	  the	  males	  were	  native	  Japanese	  speakers	  from	  Japan	  and	  the	  other	  male,	  a	  native	  Slovenian	  speaker	  from	  Slovenia.	  All	  non-­‐native	  English	  speakers	  started	  learning	  English	  as	  an	  L2	  language	  between	  9-­‐12	  years	  of	  age	  and	  did	  not	  have	  any	  knowledge	  of	  French.	  None	  of	  the	  observers	  in	  either	  group	  had	  any	  reported	  speech	  or	  hearing	  impairments.	  Below	  is	  a	  more	  detailed	  background	  on	  each	  of	  the	  individual	  speakers:	  Native	  English	  Speaker	  1:	  Female	  from	  Portland,	  Oregon	  USA.	  	   Native	  English	  Speaker	  2:	  Female	  from	  Dallas,	  Texas	  USA.	  	  	   	  	  	   25	  Native	  English	  Speaker	  3:	  Female	  from	  Vancouver,	  British	  Columbia	  Canada.	  Had	  some	  exposure	  to	  French	  while	  living	  in	  Montreal,	  Quebec	  while	  studying	  at	  a	  university.	  Her	  courses	  were	  taught	  entirely	  in	  English.	  	   Non-­‐native	  English	  Speaker	  1:	  	  Female	  from	  Japan	  that	  learned	  English	  from	  a	  Japanese	  tutor	  in	  grade	  6	  approximately	  once	  a	  week.	  Tutoring	  during	  this	  time	  focused	  only	  on	  writing	  and	  reading.	  She	  systematically	  started	  to	  learn	  English	  from	  a	  Japanese	  teacher	  from	  grade	  7	  in	  public	  school	  in	  Japan.	  She	  did	  not	  speak	  with	  native	  speakers	  of	  English	  until	  the	  age	  of	  18.	  	   Non-­‐native	  English	  Speaker	  2:	  	  Male	  from	  Japan	  started	  learning	  English	  at	  around	  9-­‐10	  years	  of	  age.	  	   Non-­‐native	  English	  Speaker	  3:	  Male	  from	  Slovenia	  started	  learning	  English	  at	  the	  age	  of	  12.	  	  	   The	  stimuli	  presented	  were	  produced	  from	  recordings	  of	  one	  balanced	  bilingual	  Canadian	  French/English	  speaker	  recorded	  on	  video	  while	  having	  a	  casual	  conversation	  with	  another	  balanced	  bilingual	  Canadian	  French/English	  speaker	  who	  was	  off	  camera.	  While	  both	  speakers	  were	  Canadian	  and	  spoke	  English	  and	  French,	  they	  were	  from	  different	  provinces	  (Alberta	  and	  Quebec)	  and	  spoke	  different	  varieties	  of	  “Laurentian	  French”	  (also	  known	  as	  “Québécois	  French”).	  One	  speaker	  was	  from	  Western	  Canada	  (Alberta)	  who	  spoke	  “Western	  Canadian	  French”;	  the	  other	  from	  Montreal	  speaking	  “Montreal	  Canadian	  French”,	  	  and	  also	  had	  a	  Swiss	  French	  substrate.	  Due	  to	  having	  only	  one	  camera	  it	  was	  decided	  to	  have	  the	  one	  speaker	  off	  camera	  because	  recording	  both	  speakers	  sitting	  facing	  each	  other	  with	  one	  camera	  would	  introduce	  difficulty	  in	  getting	  a	  straight	  angle	  on	  both	  speaker’s	  faces.	  The	  speakers	  engaged	  in	  two	  10-­‐minute	  conversations,	  one	  in	  English	  and	  the	  other	  in	  French.	  A	  total	  of	  80	  short	  clips,	  40	  from	  each	  language	  	  	   26	  conversation	  were	  extracted	  using	  Final	  Cut	  Pro	  on	  a	  Mac	  computer	  with	  a	  total	  of	  two	  conditions,	  1)	  when	  the	  speaker	  was	  in	  the	  act	  of	  an	  ISP	  (x20)	  and	  2)	  when	  the	  speaker	  was	  in	  speech-­‐ready	  position	  (x20).	  The	  methods	  and	  selection	  criteria	  for	  determining	  ISP	  and	  speech-­‐ready	  tokens	  were	  as	  follows:	  ISP:	  	  If	  the	  speaker	  had	  already	  started	  a	  speaking	  a	  phrase,	  an	  extraction	  was	  made	  anywhere	  within	  that	  phrase	  where	  there	  was	  no	  auditory	  speech.	  For	  example,	  if	  the	  speaker	  said	  a	  sentence	  such	  as,	  “It	  was	  a	  hot	  day	  yesterday”	  an	  extraction	  could	  be	  made	  during	  the	  timeframe	  between	  the	  words	  ‘it’	  and	  ‘was’	  or	  ‘day’	  and	  ‘yesterday’	  etc.	  Speech-­‐Ready:	  	  If	  the	  speaker	  was	  not	  in	  the	  act	  of	  a	  phrase.	  This	  could	  consist	  of	  moments	  when	  the	  speaker	  was	  just	  listening	  to	  the	  other	  speaker	  ranging	  all	  the	  way	  up	  to	  when	  the	  speaker	  starts	  to	  move	  articulators	  but	  before	  audible	  speech	  is	  made.	  Some	  examples	  included	  when	  the	  speaker	  was	  sitting	  and	  nodding	  his	  head	  while	  listening	  to	  the	  other	  speaker,	  or	  when	  the	  speaker	  was	  about	  to	  utter	  the	  sentence	  “Where	  did	  you	  say	  you	  were	  from?”	  in	  which	  the	  extraction	  could	  be	  made	  starting	  from	  anytime	  while	  the	  speaker	  is	  merely	  in	  the	  act	  of	  listening	  up	  to	  the	  point	  where	  the	  speaker	  opens	  his	  mouth	  to	  articulate	  /w/,	  but	  before	  the	  onset	  of	  the	  vowel.	  	  There	  were	  no	  control	  factors	  for	  specifiying	  the	  length	  of	  the	  stimuli	  or	  body	  and	  head	  movements	  included	  within	  those	  stimuli	  extractions.	  	  The	  audio	  was	  removed	  from	  all	  of	  the	  clips	  during	  extraction.	  Clips	  ranged	  from	  approximately	  100	  milliseconds	  to	  3	  seconds	  in	  length.	  Participants	  were	  	  	   27	  tested	  in	  a	  sound-­‐controlled	  room	  looking	  at	  stimuli	  via	  MS	  PowerPoint.	  	  They	  watched	  all	  80	  tokens	  for	  both	  languages	  in	  each	  of	  the	  conditions.	  The	  tokens	  were	  arranged	  in	  a	  randomized	  order	  using	  an	  online	  list	  randomization	  tool	  and	  observers	  had	  to	  judge	  which	  language	  the	  token	  was.	  Observers	  were	  not	  told	  that	  they	  were	  viewing	  clips	  of	  speakers	  in	  ISP	  or	  speech	  ready	  position.	  The	  administrator	  of	  the	  experiment	  sat	  next	  to	  the	  observer	  inside	  the	  sound	  booth	  the	  whole	  time	  and	  navigated	  to	  the	  next	  token	  when	  the	  subject	  was	  ready.	  This	  was	  done	  in	  order	  to	  make	  sure	  observers	  didn’t	  accidentally	  skip	  tokens.	  Observers	  stated	  their	  answer	  as	  French	  or	  English	  and	  the	  experimenter	  wrote	  down	  the	  answer.	  Observers	  were	  allowed	  to	  request	  replay	  of	  each	  token	  as	  many	  times	  as	  they	  wanted	  to	  before	  they	  made	  a	  judgment.	  At	  the	  end	  of	  the	  experiment	  each	  participant	  was	  given	  an	  explanation	  of	  the	  point	  of	  the	  research.	  All	  of	  the	  participants	  were	  asked	  questions	  about	  their	  performance	  such	  as	  “Did	  you	  find	  the	  discrimination	  task	  easy	  or	  hard?”	  etc.	  Several	  of	  the	  participants	  explained	  some	  of	  they	  tactics	  they	  employed	  in	  discriminating	  between	  the	  two	  languages.	  The	  most	  common	  tactic	  described	  was	  looking	  at	  the	  opening	  of	  the	  mouth	  and	  identifying	  a	  token	  as	  French	  if	  the	  mouth	  opening	  was	  small	  or	  rounded	  while	  identifying	  it	  as	  English	  if	  the	  mouth	  was	  open	  wide.	  	  2.2.	  Results	  	  The	  results	  were	  mixed	  across	  all	  observers	  for	  all	  tokens.	  In	  this	  pilot	  study,	  there	  were	  too	  few	  participants	  to	  run	  statistics,	  however	  results	  strongly	  suggest	  that	  tokens	  were	  	  	  	  	  	  identified	  based	  on	  chance.	  Figure	  2.1	  below	  illustrates	  the	  correctly	  identified	  tokens	  overall	  and	  for	  each	  language	  among	  the	  different	  observers.	  	  	   28	  Looking	  at	  the	  figure	  more	  closely	  we	  see	  that	  the	  first	  column	  represents	  the	  total	  number	  of	  tokens	  judged	  correctly	  with	  a	  total	  possible	  number	  of	  80.	  English	  speaker	  1	  identified	  45	  and	  English	  speakers	  2	  and	  3	  each	  identified	  37.	  Looking	  at	  the	  non-­‐native	  English	  speakers,	  we	  see	  similar	  results.	  Non-­‐native	  English	  speaker	  1	  identified	  46	  of	  the	  overall	  80	  tokens	  correctly	  while	  non-­‐native	  English	  speaker	  2	  identified	  41	  and	  non-­‐native	  English	  speaker	  3	  got	  39	  correct.	  Figure	  2.2	  shows	  the	  same	  numbers	  for	  participants	  broken	  down	  by	  percentage.	  The	  native	  English	  speakers	  had	  a	  combined	  accuracy	  rating	  ranging	  from	  46.25%	  to	  56.25%	  for	  all	  tokens	  whereas	  NN	  English	  speakers	  had	  a	  combined	  accuracy	  rating	  from	  48.75%	  to	  57.5%	  for	  all	  tokens.	  	  	  	  Figure	  2.1-­‐Correct	  Tokens	  Overall	  for	  English	  &	  French	  Observers	  0	  5	  10	  15	  20	  25	  30	  35	  40	  45	  50	  English	  Speaker	  1	  English	  Speaker	  2	  English	  Speaker	  3	  NN	  English	  Speaker	  1	  NN	  English	  Speaker	  2	  NN	  English	  Speaker	  3	  Correct	  tokens	  overall	  (80)	  Correct	  English	  tokens	  (40)	  Correct	  French	  tokens	  (40)	  	  	   29	  	  Figure	  2.2-­‐Overall	  %	  of	  Correct	  Tokens	  by	  Speaker	  and	  Native	  Language	  Based	  on	  the	  data	  from	  the	  pilot	  study	  it	  does	  not	  appear	  that	  the	  participants’	  ability	  to	  differentiate	  between	  ISP	  and	  speech	  ready	  positions	  across	  English	  and	  French	  stimuli	  has	  much	  to	  do	  with	  their	  age	  of	  acquisition,	  but	  this	  factor	  was	  not	  tested	  systematically	  in	  this	  study.	  The	  data	  here	  show	  that	  ISP	  and	  speech	  ready	  tokens	  do	  not	  yield	  similar	  results	  as	  the	  full	  sentences	  in	  Weikum	  et	  al.	  (2013).	  	  If	  ISP	  and	  speech-­‐ready	  tokens	  in	  this	  study	  had	  been	  perceived	  the	  same	  way	  as	  the	  full	  sentence	  and	  word	  tokens	  in	  Weikum	  et	  al.	  (2013),	  it	  would	  have	  been	  expected	  to	  see	  the	  non-­‐native	  English	  speakers	  do	  worse	  on	  the	  language	  identification	  tasks	  here.	  However,	  all	  participants	  across	  both	  groups	  appeared	  to	  have	  close	  accuracy	  ratings	  and	  both	  groups	  also	  seemed	  to	  have	  identified	  tokens	  at	  chance.	  In	  fact,	  the	  non-­‐native	  speakers	  as	  a	  whole	  did	  slightly	  better	  at	  identifying	  tokens	  as	  a	  whole	  vs.	  native	  English	  speakers,	  although	  there	  was	  0.00%	  10.00%	  20.00%	  30.00%	  40.00%	  50.00%	  60.00%	  70.00%	  English	  Speaker	  1	  English	  Speaker	  2	  English	  Speaker	  3	  NN	  English	  Speaker	  1	  NN	  English	  Speaker	  2	  NN	  English	  Speaker	  3	  Correct	  tokens	  percentage	  Correct	  tokens	  percentage	  	  	   30	  variation	  in	  individual	  speakers	  for	  both	  groups	  in	  terms	  of	  who	  did	  better	  at	  identifying	  English	  tokens	  vs.	  French	  tokens.	  	  	  It	  was	  decided	  to	  look	  at	  other	  factors	  more	  closely	  in	  order	  to	  determine	  what	  other	  visual	  information	  an	  observer	  uses	  to	  make	  a	  judgment.	  In	  order	  to	  make	  a	  qualitative	  evaluation,	  tokens	  that	  were	  identified	  correctly	  amongst	  all	  or	  most	  observers	  were	  analyzed	  further	  in	  an	  attempt	  to	  better	  understand	  what	  cues	  are	  used	  in	  order	  to	  make	  a	  judgment.	  Of	  the	  80	  tokens	  only	  one	  was	  correctly	  identified	  for	  all	  six	  observers,	  but	  there	  were	  8	  additional	  tokens	  correctly	  identified	  by	  five	  of	  the	  six	  observers	  and	  these	  nine	  tokens	  were	  given	  closer	  inspection.	  The	  token	  identified	  by	  all	  six	  observers	  showed	  the	  speaker’s	  mouth	  open	  very	  wide	  which	  could	  be	  a	  likely	  cause	  why	  it	  was	  correctly	  identified	  as	  English	  since	  most	  of	  the	  participants	  described	  how	  they	  thought	  tokens	  displaying	  a	  wider	  mouth	  opening	  were	  English.	  Also,	  observers	  tend	  to	  perceive	  French	  speech	  as	  having	  more	  lip	  rounding	  and	  less	  exaggerated	  or	  wide	  jaw	  openings	  and	  movement.	  Other	  observations	  including	  head	  nods,	  eyebrow	  movement	  and	  tight-­‐lip	  closures	  may	  also	  be	  giving	  information	  as	  to	  how	  to	  judge	  the	  target	  language.	  	  A	  further	  explanation	  of	  the	  differences	  between	  ISP	  and	  speech-­‐ready	  stimuli	  as	  they	  are	  defined	  in	  this	  thesis	  needs	  to	  be	  addressed.	  These	  two	  types	  of	  stimuli	  are	  quite	  different	  from	  each	  other.	  Movements	  occurring	  in	  ISP	  are	  dynamic	  transitions	  between	  words,	  whereas	  speech-­‐ready	  position	  may	  involve	  little	  to	  no	  movement	  depending	  on	  the	  individual	  speaker.	  	  The	  ISP	  tokens	  contained	  more	  coarticulatory	  segmental	  speech	  content	  and	  were	  on	  average	  much	  shorter	  in	  duration	  than	  speech-­‐ready	  tokens.	  ISP	  tokens	  were	  typically	  less	  than	  half	  a	  second	  	  	   31	  in	  duration,	  while	  speech-­‐ready	  tokens	  were	  as	  long	  as	  three	  seconds.	  Because	  of	  these	  differences,	  it	  is	  possible	  that	  participants	  may	  have	  had	  more	  or	  less	  difficulty	  in	  identifying	  one	  condition	  compared	  to	  the	  other.	  Future	  studies	  should	  fully	  compare	  the	  differences	  in	  the	  perception	  of	  ISP	  and	  speech-­‐ready	  stimuli	  across	  each	  language.	  	  Descriptive	  comments	  on	  the	  nine	  most	  frequently	  correctly	  identified	  tokens	  are	  shown	  in	  Table	  2.1	  below:	  Identified	  correctly	  by Token	  type Comments	  All	  participants English	  ISP	  17 Mouth	  open	  very	  wide    5/6	  participants	   English	  ISP	  02	  	   No	  lip	  rounding,	  little	  to	  no	  protrusion	  5/6	  participants	   English	  ISP	  05	  	   No	  lip	  rounding,	  little	  to	  no	  protrusion	  5/6	  participants	   English	  ISP	  08	  	   No	  lip	  rounding,	  little	  to	  no	  protrusion,	  head	  nodding 5/6	  participants	   English	  Speech	  Ready	  06	  	   Significant	  lower	  jaw	  movement,	  eyebrow	  raising 5/6	  participants	   English	  Speech	  Ready	  07	   Head	  nodding,	  lips	  pressed	  together	  tight  5/6	  participants	   English	  Speech	  Ready	  13	   Head	  nodding,	  lips	  pressed	  tightly	  together,	  some	  eyebrow	  movement 5/6	  participants	   French	  Speech	  Ready	  02	   Mouth	  open	  w/	  slight	  rounding	  5/6	  participants	   French	  Speech	  Ready	  12	   No	  actual	  lip	  movement,	  but	  position	  of	  lips	  slightly	  open,	  head	  moves	  sideways	  Table	  2.1	  -­‐	  Tokens	  Identified	  by	  at	  Least	  5	  Out	  of	  6	  Participants	  	  	  	   32	  2.3.	  Discussion	  	  The	  goal	  of	  this	  study	  was	  to	  expand	  upon	  data	  from	  Weikum	  et	  al	  (2013)	  with	  ISP	  and	  speech	  ready	  stimuli	  in	  order	  to	  determine	  if	  age	  of	  acquisition	  of	  a	  language	  contributes	  to	  adults’	  ability	  to	  distinguish	  different	  languages	  in	  visual-­‐only	  speech.	  Based	  on	  the	  results	  it	  appears	  that	  age	  of	  acquisition	  alone	  is	  not	  likely	  to	  determine	  this	  ability.	  Observers	  need	  other	  informational	  cues	  from	  the	  visual	  stimuli	  in	  order	  to	  make	  a	  correct	  judgment,	  but	  the	  question	  remains	  what	  those	  cues	  are	  exactly.	  The	  tokens	  identified	  correctly	  by	  all	  or	  most	  speakers	  showed	  various	  properties	  in	  the	  visual	  signal	  that	  included	  head	  nodding	  &	  movement,	  eyebrow	  raising	  &	  movement,	  lip	  closures	  &	  openings	  and	  lip	  protrusion.	  These	  factors	  may	  affect	  how	  an	  observer	  perceives	  silent	  ISP	  and	  speech-­‐ready	  stimuli.	  There	  may	  also	  be	  additional	  information	  that	  was	  not	  seen	  or	  mentioned	  here	  that	  play	  a	  role	  in	  this	  task.	  It	  is	  not	  known	  whether	  these	  additional	  factors	  will	  better	  help	  native	  English	  speakers	  or	  non-­‐native	  English	  speakers	  for	  this	  study.	  Based	  on	  the	  information	  covered	  so	  far	  it	  appears	  that	  all	  observers	  are	  no	  better	  or	  worse	  at	  using	  visual	  cues	  to	  identify	  tokens	  no	  matter	  what	  language	  they	  speak	  or	  when	  they	  learned	  it.	  	  At	  the	  end	  of	  the	  experiment	  observers	  were	  asked	  if	  they	  found	  certain	  tokens	  harder	  or	  easier	  to	  judge	  and	  every	  participant	  across	  both	  groups	  stated	  the	  task	  was	  highly	  difficult	  and	  that	  no	  tokens	  were	  easier	  than	  others.	  Participants	  did	  not	  know	  during	  the	  experiment	  that	  they	  were	  judging	  inter-­‐speech	  and	  ISP	  tokens,	  they	  only	  knew	  they	  were	  looking	  at	  either	  French	  or	  English,	  but	  observers	  could	  take	  note	  of	  duration	  of	  the	  token.	  All	  of	  the	  tokens	  were	  very	  short	  in	  duration	  	  	   33	  compared	  to	  full	  sentence	  and	  word	  tokens	  used	  in	  Weikum	  et	  al.	  (2013)	  but	  some	  of	  the	  tokens	  in	  this	  study	  were	  still	  significantly	  shorter	  than	  others.	  	  Looking	  at	  table	  2.1	  more	  closely	  shows	  that	  the	  tokens	  English	  ISP	  01,	  English	  ISP	  05	  and	  English	  ISP	  08	  had	  no	  lip	  rounding	  with	  little	  to	  no	  protrusion.	  These	  properties	  may	  have	  contributed	  to	  participants	  perceiving	  these	  tokens	  as	  English.	  We	  know	  that	  5	  of	  the	  6	  participants	  correctly	  guessed	  these	  tokens,	  but	  what	  if	  tokens	  displaying	  similar	  properties	  had	  been	  extracted	  from	  the	  French	  conversation?	  The	  same	  can	  be	  said	  of	  English	  ISP	  17	  token	  which	  displays	  a	  wide	  opening	  of	  the	  mouth,	  and	  which	  was	  identified	  correctly	  by	  all	  6	  of	  the	  participants.	  Is	  this	  because	  observers	  associate	  wide	  mouth	  opening	  with	  English?	  To	  what	  extent	  are	  observers	  using	  these	  visual	  properties	  to	  make	  a	  decision?	  While	  the	  properties	  of	  specific	  tokens	  are	  noted,	  they	  were	  not	  analyzed	  in	  detail	  which	  is	  something	  that	  should	  be	  expanded	  upon	  in	  future	  related	  studies.	  Based	  on	  the	  results	  it	  does	  not	  appear	  that	  participants	  were	  able	  to	  better	  identify	  speech-­‐ready	  tokens	  from	  inter-­‐speech	  tokens	  regardless	  of	  what	  their	  native	  language	  was,	  but	  we	  do	  not	  have	  concrete	  evidence	  for	  this	  and	  more	  work	  will	  need	  to	  be	  done	  in	  this	  regard.	  	  	  	   	  	  	   34	  Chapter	  3	  	  	  Native	  English	  vs.	  Native	  French	  Perceivers	  	  In	  order	  to	  better	  understand	  the	  results	  of	  the	  study	  presented	  in	  Chapter	  2,	  another	  study	  was	  conducted	  with	  more	  perceivers,	  and	  with	  both	  native	  French-­‐speaking	  and	  native	  English-­‐speaking	  perceivers.	  The	  participants	  in	  this	  experiment	  consisted	  of	  both	  native	  English	  speakers	  with	  L2	  knowledge	  of	  French	  and	  native	  French	  speakers	  with	  L2	  knowledge	  of	  English.	  Weikum	  et	  al.	  (2013)	  did	  not	  use	  any	  French	  speakers	  in	  their	  study	  and	  found	  that	  observers	  only	  needed	  to	  have	  a	  degree	  of	  proficiency	  in	  one	  of	  the	  two	  languages	  to	  discriminate	  in	  a	  same/different	  task	  between	  English	  and	  French	  sentences.	  For	  this	  second	  experiment	  it	  was	  not	  expected	  that	  native	  French	  speakers	  would	  discriminate	  better	  overall	  than	  native	  English	  speakers	  or	  vice	  versa.	  The	  main	  hypothesis	  for	  this	  second	  experiment	  is	  that	  native	  speakers	  are	  better	  able	  to	  perceive	  inter-­‐speech	  and	  pre-­‐speech	  postures	  in	  their	  first	  language	  given	  only	  visual	  information.	  An	  additional	  question	  for	  the	  second	  experiment	  was	  whether	  or	  not	  English	  speakers	  and/or	  French	  speakers	  would	  identify	  the	  same	  tokens	  correctly	  as	  the	  majority	  of	  the	  participants	  in	  the	  first	  experiment	  (shown	  in	  Table	  2.1).	  Also	  would	  French	  speakers	  be	  able	  to	  better	  discriminate	  certain	  tokens	  better	  than	  English	  speakers	  and	  vice	  versa?	  	  	  	  	  	  	   35	  3.1.	  Methods	  The	  total	  number	  of	  observers	  included	  7	  native	  English	  speakers	  and	  7	  native	  French	  speakers.	  The	  native	  English	  speakers	  were	  all	  females	  between	  the	  ages	  of	  21-­‐28	  from	  western	  Canada	  (6	  were	  from	  BC	  and	  1	  was	  from	  Alberta).	  The	  native	  French	  speakers	  were	  4	  males	  and	  3	  females	  with	  6	  of	  the	  speakers	  being	  from	  France	  and	  1	  female	  speaker	  being	  from	  Quebec,	  Canada.	  The	  French	  speakers	  were	  20-­‐34	  (5	  of	  the	  speakers	  were	  exactly	  20	  years	  of	  age)	  years	  old.	  Below	  is	  a	  more	  detailed	  background	  for	  each	  speaker:	  	  English	  Speaker	  1:	  	  21	  year	  old	  female	  born	  in	  Vancouver.	  Started	  using	  English	  from	  birth,	  and	  studied	  French	  from	  grades	  7-­‐12	  (age	  13-­‐18).	  Highest	  level	  of	  education	  in	  French	  is	  grade	  12	  level.	  	  There	  was	  a	  gap	  of	  not	  speaking	  or	  reading	  French	  from	  age	  18-­‐20.	  Currently	  only	  uses	  French	  occasionally	  for	  work	  as	  of	  recent	  (from	  Jan	  2014)	  for	  tutoring	  purposes.	  	  	  English	  Speaker	  2:	  23	  year	  old	  female	  born	  in	  Vancouver.	  Started	  using	  English	  from	  birth	  and	  is	  used	  predominantly.	  	  She	  studied	  French	  from	  elementary	  to	  high	  school	  from	  age	  9	  to	  18	  focusing	  mainly	  on	  grammar	  and	  vocabulary,	  but	  not	  spoken	  French.	  Her	  French	  schooling	  was	  not	  an	  immersive	  environment.	  She	  also	  studied	  in	  French	  in	  university	  from	  age	  20-­‐21	  at	  an	  intermediate	  level	  with	  more	  focus	  on	  spoken	  French.	  Currently	  she	  rarely	  uses	  French,	  speaking	  it	  only	  once	  in	  a	  while.	  	  English	  Speaker	  3:	  25	  year	  old	  female	  born	  in	  Surrey,	  BC.	  Started	  learning	  French	  around	  grade	  4	  or	  5	  and	  took	  French	  courses	  up	  until	  grade	  11.	  Returned	  to	  studying	  French	  with	  courses	  during	  first	  year	  of	  University	  in	  2006/2007	  in	  Detroit	  where	  French	  courses	  were	  equivalent	  to	  high	  school	  level.	  Currently	  only	  uses	  English.	   English	  Speaker	  4:	  24	  year	  old	  female	  born	  in	  New	  Westminster,	  BC.	  She	  studied	  French	  from	  grades	  6-­‐8	  when	  she	  was	  between	  11-­‐13	  years	  old	  and	  only	  used	  in	  school	  for	  French	  class.	   English	  Speaker	  5:	  23	  year	  old	  female	  born	  and	  raised	  in	  Quesnel,	  B.C.	  	  She	  started	  using	  French	  at	  the	  age	  of	  5	  in	  a	  French	  Immersion	  school	  for	  13	  years.	  She	  had	  one	  gap	  year	  afterwards	  	  	   36	  and	  continued	  to	  take	  French	  courses	  at	  university	  for	  3	  more	  years.	  	  She	  took	  100/200	  university	  level	  courses	  with	  no	  gaps.	  	  At	  university	  she	  only	  used	  French	  within	  her	  courses.	  	  In	  2011,	  she	  studied	  French	  for	  4	  months	  in	  Nantes,	  France,	  taking	  intermediate	  level	  courses.	  	  She	  used	  French	  on	  a	  daily	  basis	  for	  the	  majority	  of	  her	  stay	  in	  all	  settings.	  	  Since	  then	  she	  has	  not	  taken	  any	  further	  French	  courses.	  	  	  	  English	  Speaker	  6:	  28	  year	  old	  female	  born	  in	  Victoria,	  BC.	  English	  was	  used	  for	  all	  of	  her	  education	  and	  she	  uses	  English	  everyday	  at	  school,	  work	  and	  home.	  	  	  English	  Speaker	  7:	  21	  year	  old	  female	  born	  in	  Edmonton,	  Alberta.	  She	  attended	  school	  and	  university	  taught	  exclusively	  in	  English.	  She’s	  spoken	  English	  since	  birth	  and	  has	  very	  little	  experience	  with	  French,	  having	  taken	  one	  or	  two	  beginner	  classes	  around	  the	  age	  of	  12.	  Her	  French	  use	  was	  limited	  to	  the	  classroom	  and	  she	  has	  not	  used	  it	  since.	  English	  is	  the	  only	  language	  she’s	  ever	  used	  on	  a	  regular	  basis.	  	   French	  Speaker	  1:	  20	  year	  old	  female	  born	  in	  Brittany,	  France.	  She	  has	  used	  the	  language	  daily	  even	  in	  Canada	  having	  always	  been	  surrounded	  by	  French	  people.	  She	  started	  learning	  English	  at	  school	  at	  8	  years	  of	  age	  and	  had	  only	  a	  few	  hours	  of	  English	  lessons	  per	  week	  until	  the	  end	  of	  middle	  school.	  She	  took	  more	  advanced	  classes	  in	  English	  literature	  and	  history	  in	  high	  school	  (approximately	  8	  hours	  per	  week).	  Her	  classes	  at	  university	  were	  exclusively	  taught	  in	  English,	  mostly	  by	  native	  speakers,	  and	  she	  used	  this	  language	  to	  write	  academic	  papers,	  oral	  presentations	  and	  to	  converse	  with	  international	  classmates.	  She	  was	  considered	  as	  fluent	  in	  English	  according	  to	  her	  French	  university.	  	  French	  Speaker	  2:	  20	  year	  old	  male	  born	  in	  Niort	  (west	  of	  France,	  near	  Atlantic	  ocean).	  Started	  learning	  basic	  English	  at	  the	  age	  of	  8	  at	  school	  (numbers,	  alphabet,	  introductions...etc.)	  with	  more	  comprehensive	  instruction	  beginning	  around	  middle	  school	  (age	  11).	  Throughout	  middle	  school	  and	  high	  school	  years,	  teaching	  of	  English	  was	  not	  outstanding.	  He	  had	  sufficient	  English	  grammatical	  structure	  and	  vocabulary	  and	  became	  interested	  in	  improving	  on	  his	  own	  time.	  His	  pronunciation	  improved	  from	  watching	  English	  movies	  and	  TV	  shows.	  When	  he	  entered	  university	  in	  Paris,	  he	  began	  meeting	  exchange	  students	  from	  Britain	  while	  continuing	  English	  studies	  where	  all	  his	  teachers	  were	  native	  English	  speakers	  (from	  Canada,	  Ireland	  and	  USA)	  which	  was	  not	  the	  case	  in	  grade	  school.	  Since	  coming	  to	  UBC,	  he	  uses	  French	  every	  day	  with	  fellow	  exchange	  friends	  but	  has	  no	  problem	  attending	  a	  class	  or	  having	  conversations	  with	  foreign	  friends	  in	  English.	  	  	  	  	  	   37	  French	  Speaker	  3:	  20	  year	  old	  female	  born	  in	  Orsay,	  France.	  Started	  learning	  English	  in	  primary	  school	  around	  8-­‐9	  years	  old.	  She	  has	  never	  stopped	  using	  French,	  but	  English	  was	  part	  of	  her	  education	  as	  her	  parents	  spoke	  English	  sometimes	  for	  their	  jobs	  though	  she	  never	  had	  conversations	  in	  English	  in	  their	  daily	  lives.	  Since	  she	  has	  had	  the	  chance	  to	  live	  and	  study	  at	  UBC	  a	  whole	  academic	  year	  her	  English	  has	  improved.	  In	  high	  school	  and	  at	  her	  home	  university	  in	  France,	  she	  had	  English	  classes	  for	  3	  or	  4	  hours	  maximum	  per	  week.	  	  	  French	  Speaker	  4:	  20	  year	  old	  male	  born	  in	  Saint-­‐Cloud,	  France.	  He	  started	  to	  learn	  English	  at	  school	  when	  he	  was	  9	  and	  has	  been	  speaking	  English	  continuously	  from	  that	  time.	  His	  language	  of	  instruction	  from	  3	  to	  18	  years	  old	  was	  French.	  During	  his	  second	  year	  at	  university	  in	  France	  he	  had	  courses	  in	  French	  and	  English.	  	  During	  his	  year	  at	  UBC	  his	  instruction	  has	  been	  in	  English	  only.	  He	  uses	  French	  everyday	  with	  friends	  and	  by	  reading	  website	  pages.	  He	  also	  uses	  English	  everyday	  watching	  series,	  reading	  internet	  sites	  and	  speaking	  with	  foreign	  friends.	  	  French	  Speaker	  5:	  34	  year	  old	  female	  born	  in	  Québec	  Canada.	  She	  attended	  French	  grade	  schools	  with	  1	  class	  of	  English	  per	  week	  taught	  in	  elementary	  and	  high	  school.	  She	  started	  using	  English	  on	  a	  regular	  basis	  at	  the	  age	  of	  16.	  Since	  the	  age	  of	  20	  she	  has	  used	  written	  and	  spoken	  English	  at	  work	  and	  reads	  papers	  in	  English	  at	  university.	   French	  Speaker	  6:	  20	  year	  old	  male	  born	  in	  Nimes,	  France.	  He	  started	  studying	  English	  at	  the	  age	  of	  12	  but	  rarely	  practiced,	  using	  it	  only	  2	  hours	  per	  week	  in	  class.	  In	  2011	  he	  started	  studying	  English	  more	  intensely.	  	  In	  August	  2013	  he	  came	  to	  Vancouver	  and	  started	  using	  English	  on	  a	  daily	  basis.	  Recently	  he	  received	  the	  maximum	  level	  on	  his	  English	  exam	  through	  his	  French	  university	  that	  is	  equivalent	  to	  the	  European	  C2	  level.	  	  French	  Speaker	  7:	  Male	  from	  France.	  In	  2013,	  spent	  the	  summer	  in	  Vancouver	  with	  everyday	  exposure	  to	  English.	  He	  used	  both	  French	  and	  English	  on	  a	  daily	  basis.	  	  	   The	  procedure	  introduced	  some	  changes	  from	  the	  pilot	  study	  described	  in	  Chapter	  2.	  The	  stimuli	  presented	  were	  not	  completely	  identical	  to	  the	  stimuli	  presented	  in	  the	  experiment	  in	  Chapter	  2	  due	  to	  those	  changes	  in	  the	  presentation	  format	  which	  is	  further	  discussed	  below.	  	  	  	   38	  While	  the	  signal	  content	  of	  each	  stimulus	  was	  the	  same	  as	  before,	  experiment	  1	  used	  MS	  PowerPoint	  slides	  to	  present	  to	  observers	  while	  the	  experimenter	  sat	  next	  to	  the	  subject	  and	  the	  experimenter	  was	  the	  one	  navigating	  through	  the	  stimuli	  while	  also	  writing	  down	  the	  observers’	  responses.	  For	  this	  second	  experiment,	  the	  video	  stimuli	  were	  presented	  through	  the	  PsyScope	  (Cohen	  et	  al.,	  1993)	  application	  which	  also	  recorded	  all	  participants	  responses	  along	  with	  their	  response	  times	  while	  also	  eliminating	  the	  need	  for	  the	  experimenter	  to	  be	  in	  the	  room	  with	  the	  participant.	  At	  the	  beginning	  of	  each	  session,	  participants	  went	  through	  a	  training	  session	  consisting	  of	  20	  trials	  in	  a	  randomized	  order.	  There	  were	  3	  different	  tokens	  in	  the	  training	  session.	  At	  the	  beginning	  of	  each	  trial	  token	  the	  stimuli	  would	  appear	  with	  the	  video	  paused	  at	  the	  very	  beginning.	  The	  participant	  would	  then	  be	  required	  to	  press	  the	  spacebar	  for	  the	  video	  to	  play.	  Once	  the	  video	  was	  finished	  the	  screen	  would	  go	  blank,	  at	  which	  point	  the	  observer	  was	  required	  to	  enter	  ‘1’	  for	  English	  or	  ‘9’	  for	  French	  before	  they	  were	  able	  to	  proceed	  to	  the	  next	  trial	  token.	  At	  the	  end	  of	  the	  training	  session	  there	  was	  an	  instruction	  screen	  that	  indicated	  that	  the	  actual	  experiment	  was	  starting	  which	  then	  used	  the	  same	  80	  tokens	  that	  were	  used	  in	  the	  experiment	  described	  in	  Chapter	  2.	  	  All	  of	  the	  tokens	  were	  randomized	  through	  PsyScope.	  The	  participants	  had	  full	  control	  as	  to	  when	  to	  start	  the	  play	  of	  each	  token.	  All	  participants	  viewed	  the	  stimuli	  in	  a	  sound	  booth	  or	  a	  quiet,	  empty	  room.	  	  3.2.	  Results	  and	  Discussion	  	  Statistics	  for	  accuracy	  were	  calculated	  using	  R,	  which	  took	  proportion	  correct	  (PC)	  and	  total	  correct	  (TC)	  by	  speaker	  for	  each	  individual	  token.	  For	  example	  the	  token	  	  	   39	  English	  ISP1A	  had	  a	  PC	  value	  of	  0.571	  and	  TC	  value	  of	  4	  for	  the	  English	  group	  which	  means	  4	  of	  the	  7	  English	  speakers	  identified	  this	  token	  correctly	  with	  a	  57%	  accuracy	  rate.	  Overall	  there	  did	  not	  appear	  to	  be	  a	  significant	  difference	  between	  the	  accuracy	  rate	  of	  the	  English	  and	  French	  speakers	  across	  overall	  tokens.	  Overall	  English	  speakers	  accuracy	  rate	  across	  all	  tokens	  was	  near	  50%	  whereas	  overall	  French	  speakers	  accuracy	  rate	  across	  all	  tokens	  was	  slightly	  above	  50%	  shown	  in	  table	  3.1	  below.	  When	  language	  of	  the	  stimulus	  is	  not	  a	  factor,	  speakers	  perform	  equally	  on	  the	  task	  (mean	  English	  group	  accuracy	  =	  49.11%,	  mean	  French	  group	  accuracy	  =	  52.50%,	  t	  (1118)	  =	  -­‐1.14,	  p	  =	  .257).	  This	  is	  to	  show	  that	  the	  groups	  did	  similar	  in	  task	  performance.	  When	  speakers’	  accuracy	  on	  their	  native	  language	  was	  compared	  to	  speakers’	  accuracy	  on	  their	  L2	  language,	  irrespective	  of	  what	  their	  native	  language	  was	  a	  significant	  difference	  was	  observed,	  (mean	  L1	  accuracy	  =	  54.29%,	  mean	  L2	  accuracy	  =	  47.32%,	  t	  (1118)	  =	  2.33,	  p	  	  =	  .020).	  However,	  when	  you	  look	  at	  the	  results	  based	  on	  the	  first	  language	  of	  the	  participants,	  the	  significant	  difference	  in	  accuracy	  only	  emerges	  for	  native	  English	  speakers	  and	  not	  the	  native	  French	  speakers.	  Two-­‐way	  T-­‐Tests	  comparing	  participants	  divided	  into	  two	  groups	  (English	  L1	  speakers,	  French	  L1	  speakers)	  were	  run	  to	  compare	  their	  performance	  on	  English	  vs.	  French	  tokens	  using	  R,	  and	  the	  values	  are	  shown	  for	  each	  of	  the	  groups	  in	  Table	  3.1	  below:	  	  	  	  	  	   40	  English	  group	  T-­‐test	  results	   French	  group	  T-­‐test	  results	  t=	  	  -­‐2.46	   t=	  	  0.84	  df=	  558	   df=	  558	  p-­‐value	  =	  0.014	   p-­‐value	  =	  0.399	  mean	  English	  resp=	  	  55.00%	   mean	  English	  resp=	  	  50.00%	  mean	  French	  resp=	  	  44.64%	   mean	  French	  resp=	  	  53.57%	  Table	  3.1-­‐T-­‐Test	  Results	  for	  English	  and	  French	  Groups	  	  The	  mean	  English	  and	  French	  resp.	  is	  the	  accuracy	  rate	  for	  the	  English	  and	  French	  tokens.	  Despite	  an	  insignificant	  difference	  among	  the	  English	  and	  French	  group	  across	  tokens	  overall,	  it	  was	  shown	  that	  both	  groups	  had	  a	  higher	  accuracy	  rating	  when	  judging	  tokens	  in	  their	  respective	  languages.	  The	  English	  group	  showed	  around	  a	  10%	  higher	  accuracy	  rate	  in	  the	  English	  tokens	  while	  the	  French	  group	  appeared	  to	  be	  more	  balanced	  but	  still	  showed	  higher	  accuracy	  when	  judging	  French	  tokens.	  These	  results	  are	  illustrated	  in	  figure	  3.1	  below.	  	  	  Figure	  3.1	  -­‐	  Accuracy	  %	  by	  Native	  Language	  0	  10	  20	  30	  40	  50	  60	  70	  English	  Speakers	   French	  Speakers	  English	  Tokens	  French	  Tokens	  	  	   41	  The	  results	  from	  the	  second	  experiment	  in	  this	  chapter	  lend	  support	  to	  the	  hypothesis	  posed	  in	  Chapter	  1	  that	  observers	  are	  able	  to	  better	  identify	  ISP	  and	  speech-­‐ready	  postures	  visually	  in	  the	  languages	  they	  speak	  natively	  vs.	  the	  ones	  they	  do	  not.	  This	  shows	  that	  language	  experience	  and	  background	  contributes	  to	  the	  ability	  to	  use	  these	  visual	  signals	  that	  have	  no	  segmental	  background.	  The	  results	  also	  allow	  us	  to	  address	  another	  question	  that	  was	  posed	  in	  the	  introduction	  to	  this	  chapter:	  whether	  observers	  who	  are	  native	  French	  speakers	  are	  able	  to	  discriminate	  certain	  tokens	  better	  than	  observers	  who	  are	  native	  English	  speakers	  and	  vice	  versa?	  	  It	  is	  important	  to	  note	  again	  the	  differences	  between	  ISP	  and	  speech-­‐ready	  because	  of	  the	  different	  information	  available	  in	  the	  two	  conditions.	  It	  was	  decided	  to	  investigate	  the	  ISP	  and	  speech-­‐ready	  stimuli	  more	  closely	  between	  the	  two	  groups.	  A	  Confusion	  Matrix	  was	  created	  in	  R	  to	  illustrate	  how	  French	  and	  English	  participants	  responded	  to	  the	  different	  stimuli	  in	  each	  of	  the	  conditions	  (ISP	  &	  	  	   42	  speech-­‐ready)	  across	  each	  language	  shown	  in	  figure	  3.2	  below.	  Figure	  3.2-­‐	  Confusion	  Matrix	  for	  Subject’s	  Identification	  Responses.	  	  	  Looking	  at	  the	  matrix	  (lighter	  color	  shades	  represent	  a	  higher	  number	  of	  responses	  while	  darker	  color	  shades	  show	  a	  lower	  number	  of	  responses),	  it	  shows	  that	  the	  French	  subjects	  responded	  ‘French’	  and	  ‘English’	  equally	  for	  all	  ISP	  stimuli	  (left	  column).	  This	  suggests	  that	  French	  participants	  were	  responding	  or	  guessing	  roughly	  evenly	  for	  all	  ISP	  stimuli	  regardless	  of	  the	  language.	  The	  English	  subjects	  responded	  ‘French’	  more	  when	  the	  ISP	  stimuli	  were	  French	  and	  responded	  ‘French’	  less	  when	  the	  ISP	  stimuli	  were	  English	  and	  vice	  versa	  for	  English.	  So	  the	  English	  participants	  seemed	  to	  do	  better	  at	  identifying	  ISP	  stimuli	  than	  the	  French	  participants	  who	  appeared	  to	  be	  guessing	  across	  all	  ISP	  stimuli.	  For	  speech-­‐ready	  	  	   43	  stimuli,	  French	  subjects	  appeared	  to	  pick	  ‘English’	  less	  when	  the	  speech-­‐ready	  stimuli	  were	  French	  while	  picking	  ‘French’	  more	  when	  the	  speech	  ready	  stimuli	  were	  French	  so	  they	  did	  well	  on	  identifying	  French	  stimuli	  in	  the	  speech-­‐ready	  condition.	  However,	  French	  subjects	  seemed	  to	  pick	  ‘English’	  and	  ‘French’	  equally	  when	  the	  speech-­‐ready	  tokens	  were	  in	  fact	  English.	  The	  English	  subjects	  appeared	  to	  be	  more	  biased	  toward	  picking	  English	  in	  the	  speech-­‐ready	  condition	  overall	  because	  it	  shows	  they	  picked	  ‘English’	  more	  whether	  the	  stimulus	  language	  was	  English	  or	  French,	  and	  the	  English	  participants	  picked	  ‘French’	  	  regardless	  whether	  the	  speech-­‐ready	  stimuli	  was	  English	  or	  French.	  D-­‐prime	  scores	  were	  calculated	  in	  R	  and	  the	  results	  were:	  ISP	  for	  English	  L1	  subjects	  (0.01790544),	  ISP	  for	  French	  L1	  subjects	  (0.12566135),	  Speech-­‐ready	  for	  English	  L1	  subjects	  (-­‐0.03697919),	  Speech-­‐ready	  for	  French	  L1	  subjects	  (0.05435102).	  	   In	  addition,	  an	  interaction	  plot	  (shown	  in	  figure	  3.3	  below)	  was	  created	  showing	  the	  mean	  accuracy	  rating	  for	  speech-­‐ready	  and	  ISP	  tokens	  across	  both	  language	  groups.	  Looking	  at	  the	  interaction	  plot,	  it	  appears	  that	  there	  was	  not	  a	  significant	  difference	  for	  either	  language	  group	  in	  the	  ISP	  condition.	  	  The	  English	  group	  identified	  ISP	  tokens	  whether	  they	  were	  English	  or	  French	  with	  about	  50%	  accuracy.	  The	  French	  group	  identified	  ISP	  tokens	  slightly	  better	  when	  the	  stimuli	  were	  English	  (55%)	  than	  when	  they	  were	  French	  (50%),	  but	  this	  did	  not	  appear	  significant.	  The	  significance	  seems	  only	  to	  appear	  for	  the	  speech-­‐ready	  tokens	  and	  shows	  the	  English	  group	  is	  around	  20%	  better	  on	  English	  speech	  ready	  stimuli	  than	  French	  ones.	  The	  French	  group	  also	  had	  a	  higher	  accuracy	  in	  identifying	  French	  	  	   44	  speech-­‐ready	  stimuli	  than	  English	  ones	  though	  the	  gap	  was	  not	  as	  wide	  for	  them	  as	  it	  was	  for	  the	  English	  group.	  	  Figure	  3.3-­‐Mean	  Accuracy	  Rate	  by	  Stimulus	  Language	  Two-­‐way	  and	  three-­‐way	  ANOVAs	  were	  calculated	  in	  order	  to	  test	  the	  importance	  of	  stimulus	  language,	  subject’s	  L1	  language,	  and	  stimuli	  condition	  (ISP,	  speech-­‐ready).	  	  The	  two-­‐way	  variance	  of	  stimulus	  language	  and	  subject’s	  L1	  language	  was	  found	  to	  be	  significant,	  while	  the	  three-­‐way	  variance	  of	  stimulus	  language,	  subject’s	  L1	  language,	  and	  stimuli	  condition	  was	  found	  to	  be	  highly	  significant.	  	  	  -­‐Stimulus	  language:Subject	  L1=	  Df(1),	  Sum	  Sq	  (1.36),	  Mean	  Sq	  (1.358),	  F	  Value	  (5.476),	  Pr(>F)	  0.01945	  	  	   45	  -­‐Stimulus	  language:Subject	  L1:stimuli	  condition	  =	  Df(1),	  Sum	  Sq	  (2.32),	  Mean	  Sq	  (2.3223),	  F	  Value	  (9.365),	  Pr(>F)	  0.00227	  	  Qualitative	  properties	  of	  tokens	  that	  were	  most	  correctly	  and	  least	  correctly	  identified	  in	  experiment	  2	  were	  also	  noted	  just	  as	  they	  were	  in	  experiment	  1.	  Table	  3.2	  below	  shows	  a	  list	  of	  those	  tokens	  either	  correctly	  or	  incorrectly	  identified	  by	  all	  members	  in	  each	  group.	  	  Group	   Token	   Correct/Incorrect	   	  English	   English	  speech	  ready	  4A	   Correct	   Head	  tilted,	  mouth	  open	  English	   French	  ISP	  5A	   Correct	   Hand	  movement,	  eyes	  looking	  to	  the	  side,	  mouth	  open	  wide	  English	   English	  speech	  ready	  8A	   Incorrect	   Lip	  rounding	  in	  beginning	  transitions	  to	  smile	  English	   French	  speech	  ready	  13A	   Incorrect	   Head	  nodding,	  licking	  of	  lips,	  lip	  tightening	  	   	   	   	  French	   English	  speech	  ready	  9A	   Correct	   Open	  mouth	  smile	  	  French	   English	  speech	  ready	  15A	   Correct	   Head	  movement,	  mouth	  open	  French	   English	  ISP	  8A	   Correct	   Mouth	  open,	  tongue	  touches	  teeth	  French	   English	  ISP	  14A	   Correct	   Lower	  jaw	  dropping	  French	   French	  speech	  ready	  11A	   Correct	   Slight	  rounding	  of	  lipst,	  lip	  tightening	  French	   English	  ISP	  6A	   Incorrect	   Open	  mouth	  smile,	  slight	  lip	  rounding	  French	   English	  ISP	  12A	   Incorrect	   Rapid	  mouth	  opening/closing	  Table	  3.2	  -­‐	  Tokens	  Correctly	  or	  Incorrectly	  Identified	  by	  All	  Members	  of	  Each	  Language	  Group	  	  	   46	  None	  of	  the	  tokens	  were	  correctly	  or	  incorrectly	  identified	  across	  all	  speakers	  in	  both	  groups.	  Inspecting	  the	  descriptive	  data,	  there	  were	  2	  tokens	  that	  were	  always	  identified	  correctly	  by	  all	  participants	  in	  the	  English	  group	  (English	  speech	  ready	  4A,	  French	  ISP	  5A)	  and	  identified	  2	  tokens	  that	  were	  always	  identified	  incorrectly	  (English	  speech	  ready	  8A,	  French	  speech	  ready	  13A).	  Chapter	  2	  discussed	  individual	  stimuli	  tokens	  that	  were	  correctly/incorrectly	  identified	  by	  all	  or	  most	  participants	  in	  experiment	  1;	  experiment	  2	  also	  shows	  certain	  stimuli	  tokens	  that	  were	  perceived	  as	  correct	  or	  incorrect	  among	  participants	  in	  the	  two	  language	  groups.	  Looking	  at	  table	  3.2	  we	  see	  that	  the	  all	  the	  participants	  in	  the	  English	  group	  correctly	  identified	  English	  speech	  ready	  4A	  token,	  while	  they	  all	  incorrectly	  identified	  English	  speech	  ready	  8A	  token.	  English	  speech	  ready	  4A	  showed	  the	  head	  tilted	  with	  the	  mouth	  open,	  so	  it	  is	  possible	  that	  the	  participants	  in	  the	  English	  group	  associate	  these	  properties	  with	  the	  English	  language	  which	  may	  have	  contributed	  to	  their	  perfect	  accuracy	  rating	  on	  this	  particular	  token.	  On	  the	  other	  hand,	  English	  speech	  ready	  8A	  token	  which	  displayed	  some	  lip	  rounding,	  was	  incorrectly	  identified	  among	  all	  English	  participants	  perhaps	  due	  to	  lip	  rounding	  being	  thought	  of	  as	  having	  French-­‐like	  properties.	  French	  speech	  ready	  13A	  token	  was	  also	  incorrectly	  identified	  amongst	  all	  English	  participants,	  this	  token	  showed	  head	  nodding	  and	  lip	  tightening	  and	  perhaps	  these	  properties	  are	  generally	  perceived	  as	  features	  of	  English	  to	  the	  English	  participants.	  It	  is	  difficult	  to	  say	  what	  exactly	  is	  contributing	  to	  each	  of	  the	  individual	  participant’s	  perception	  across	  both	  groups,	  but	  it	  would	  be	  valuable	  to	  consider	  the	  descriptive	  properties	  of	  these	  individual	  tokens	  in	  future	  work.	   	  	  	   47	  Chapter	  4	  	  Conclusion	  	  The	  data	  shown	  from	  the	  two	  experiments	  suggest	  that	  an	  observer’s	  native	  language	  does	  affect	  his	  or	  her	  ability	  to	  discriminate	  visually	  across	  languages	  in	  ISP	  and	  speech	  ready	  tokens.	  Weikum	  et	  al’s.	  (2013)	  data	  already	  showed	  that	  it	  is	  possible	  to	  distinguish	  languages	  visually	  when	  using	  stimuli	  containing	  segmental	  speech	  information	  in	  the	  form	  of	  full	  sentences	  and	  words.	  Weikum	  et	  al.	  (2013)	  also	  explained	  how	  the	  length	  of	  stimulus	  can	  play	  a	  factor	  in	  how	  easy	  or	  difficult	  it	  is	  for	  observers	  to	  make	  a	  judgment.	  Their	  data	  support	  the	  hypothesis	  that	  the	  longer	  the	  stimuli	  the	  more	  information	  observers	  can	  use	  to	  make	  an	  effective	  judgment.	  Prior	  to	  the	  present	  thesis,	  stimuli	  such	  as	  ISP	  and	  speech-­‐ready	  tokens	  used	  in	  the	  present	  experiment	  had	  not	  been	  tested	  in	  a	  study	  of	  visual-­‐only	  perception.	  Despite	  the	  tokens	  in	  the	  present	  thesis	  being	  much	  shorter	  in	  length	  than	  those	  used	  by	  Weikum	  et	  al.	  (2013)	  and	  largely	  lacking	  segmental	  speech	  information,	  the	  present	  results	  nevertheless	  indicate	  that	  native	  speakers	  are	  able	  to	  visually	  identify	  speech-­‐ready	  postures	  of	  their	  native	  language.	  It	  is	  important	  to	  note	  that	  these	  speech-­‐ready	  postures	  were	  longer	  in	  duration	  but	  contained	  less	  segmental	  coarticulatory	  information	  than	  the	  inter-­‐speech	  postures,	  suggesting	  that	  perceivers	  are	  using	  non-­‐segmental	  visual	  information	  about	  facial	  posture	  to	  distinguish	  between	  English	  and	  French.	  There	  remain	  several	  things	  to	  explore	  here.	  One	  possible	  step	  is	  to	  analyze	  more	  closely	  the	  tokens	  that	  were	  identified	  all	  correctly	  or	  incorrectly	  by	  the	  	  	   48	  English	  or	  French	  groups	  as	  shown	  in	  table	  3.2	  to	  identify	  qualitatively	  the	  features	  perceivers	  are	  using	  (either	  correctly	  or	  incorrectly)	  to	  identify	  their	  native	  language.	  Based	  on	  observations	  in	  the	  present	  study,	  these	  features	  are	  likely	  to	  involve	  more	  than	  just	  articulator	  positioning,	  such	  that	  gestures	  including	  head	  movements,	  nodding,	  eye	  movement,	  etc.	  will	  have	  an	  effect	  on	  how	  different	  observers	  respond.	  Future	  studies	  should	  focus	  on	  extralinguistic	  information	  such	  as	  this	  in	  order	  to	  better	  understand	  the	  correlation	  between	  perception	  and	  gesture.	  Also,	  given	  the	  variation	  observed	  across	  individual	  participants,	  it	  will	  be	  important	  for	  future	  studies	  to	  include	  relatively	  large	  numbers	  of	  observers.	  	   	  	  	   49	  References	  	  Campbell	  C.S.,	  &	  Massaro	  D.W.,	  (1997).	  Perception	  of	  visible	  speech:	  Influence	  of	  spatial	  quantization.	  Perception,	  26,	  627–644.	  [PubMed:	  9488886]	  	  Cohen	  J.D.,	  MacWhinney	  B.,	  Flatt	  M.,	  and	  Provost	  J.	  (1993).	  PsyScope:	  A	  new	  graphic	  interactive	  environment	  for	  designing	  psychology	  experiments.	  Behavioral	  Research	  Methods,	  Instruments,	  and	  Computers,	  25(2),	  257-­‐271.	  	  Conrey,	  B.,	  &	  Gold,	  J.,	  (2006).	  An	  ideal	  observer	  analysis	  of	  variability	  in	  visual-­‐only	  speech.	  Vision	  Research,	  46,	  3243-­‐3258.	  	  Geisler,	  W.	  S.	  (2004).	  Ideal	  observer	  analysis.	  In	  J.	  S.	  Werner	  &	  L.	  M.	  Chalupa	  (Eds.),	  The	  visual	  neurosciences.	  Cambridge,	  Mass:	  MIT	  Press,	  Chapter	  52.	  	  Gick,	  B.,	  Wilson,	  I.,	  Koch,	  K.,	  Cook,	  C.	  (2004).	  Language-­‐Specific	  Articulatory	  Settings:	  Evidence	  from	  Inter-­‐Utterance	  Rest	  Position.	  Phonetica,	  61,	  220-­‐233.	  	  Lansing,	  C.	  R.,	  &	  McConkie,	  G.	  W.	  (2003).	  Word	  identification	  and	  eye	  fixation	  locations	  in	  visual	  and	  visual-­‐plus-­‐auditory	  presentations	  of	  spoken	  sentences.	  Perception	  &	  Psychophysics,	  6(4),	  536–552.	  	  Munhall,	  K.,	  &	  Vatikiotis-­‐Bateson,	  E.	  (1998).	  The	  moving	  face	  during	  speech	  communication.	  Hearing	  by	  Eye	  II:	  Advances	  in	  the	  Psychology	  of	  Speechreading	  and	  Auditory-­‐visual	  Speech,	  2,	  123-­‐137.	  	  Munhall,	  K.,	  Jones,	  J.,	  Callan,	  D.,	  Kuratate,	  T.,	  &	  Vatikiotis-­‐Bateson,	  E.	  (2004).	  Visual	  prosody	  and	  speech	  intelligibility	  head	  movement	  improves	  auditory	  speech	  perception.	  Psychological	  science,	  15(2),	  133-­‐137.	  	  Ramanarayanan,	  V.,	  Goldstein,	  L.,	  Byrd,	  D.,	  ,	  &	  Narayanan,	  S.	  C.	  (2013).	  An	  investigation	  of	  articulatory	  setting	  using	  real-­‐time	  magnetic	  resonance	  imaging.	  Journal	  of	  Acoustical	  Society	  of	  America,	  134,	  510-­‐519.	  	  Ramanarayanan,	  V.,	  Bresch,	  E.,	  Byrd,	  D.,	  Goldstein,	  L.,	  &	  Narayanan,	  S.	  S.	  (2009).	  Analysis	  of	  pausing	  behavior	  in	  spontaneous	  speech	  using	  real-­‐time	  magnetic	  resonance	  imaging	  of	  articulation.	  The	  Journal	  of	  the	  Acoustical	  Society	  of	  America,	  126(5),	  EL160-­‐EL165.	  	  de	  los	  Reyes	  Rodríguez	  Ortiz,	  I.,	  (2008)	  Lipreading	  in	  the	  Prelingually	  Deaf:	  What	  makes	  a	  Skilled	  Speechreader?	  The	  Spanish	  journal	  of	  psychology,	  11	  (2),	  488-­‐502	  	  	  	  	  	   50	  Ronquest,	  R.,	  Levi,	  S.,	  &	  Pisoni,	  D.,	  (2010)	  Language	  identification	  from	  visual-­‐only	  speech	  signals.	  Atten	  Percept	  Psychophys,	  72	  (6),	  1601-­‐1613.	  	  doi:10.3758/APP.72.6.1601	  	  Schaeffler,	  S.,	  Scobbie,	  J.,	  Mennen,	  I.	  (2008).	  An	  Evaluation	  of	  Inter-­‐Speech	  Postures	  for	  the	  Study	  of	  Language-­‐Specific	  Articulatory	  Settings.	  8th	  International	  Seminar	  on	  Speech	  Production,	  121-­‐124.	  	  Soto-­‐Faraco,	  S.,	  Navarra,	  J.,	  Weikum,	  W.,	  Vouloumanos,	  A.,	  Sebastián-­‐Gallés,	  N.,	  &	  Werker,	  J.	  (2007).	  Discriminating	  languages	  by	  speech-­‐reading.	  Perception	  &	  Psychophysics,	  69	  (2),	  218-­‐231.	  	  Sumby,	  W.H.,	  &	  Pollack,	  I.,.	  (1954)	  Visual	  Contribution	  to	  Speech	  Intelligibility	  in	  Noise.	  The	  Journal	  of	  the	  Acoustical	  Society	  of	  America,	  26	  (2),	  212-­‐215.	  	  Vatikiotis-­‐Bateson,	  E.,	  Barbosa,	  A.,	  Yi	  Chow,	  C.,	  Oberg,	  M.,	  Tan,	  J.	  &	  Yehia,	  H.,	  (2007).	  Audiovisual	  Lombard	  speech:	  Reconciling	  production	  &	  perception.	  Auditory-­‐Visual	  Speech	  Processing	  2007.	  	  Weikum,	  W.	  M.,	  Vouloumanos,	  A.,	  Navarra,	  J.,	  Soto-­‐Faraco,	  S.,	  Sebastián-­‐Gallés,	  N.,	  &	  Werker,	  J.	  (2007).	  Visual	  language	  discrimination	  in	  infancy.	  Science,	  316(5828),	  1159-­‐1159.	  	  Weikum,	  W.,	  Vouloumanos,	  A.,	  Navarra,	  J(2013).	  Age-­‐related	  sensitive	  periods	  influence	  visual	  language	  discrimination	  in	  adults.	  Frontiers	  in	  Systems	  Neuroscience,	  7,	  1-­‐8.	  doi:	  10.3389/fnsys.2013.00086	  	  Wilson,	  I.,	  (2006)	  Articulatory	  Settings	  of	  French	  and	  English	  Monolingual	  and	  Bilingual	  Speakers.	  PhD	  dissertation,	  University	  of	  British	  Columbia.	  	  Wilson,	  I.,	  &	  Gick,	  B.	  (2013).	  Bilinguals	  use	  language-­‐specific	  articulatory	  settings.	  Journal	  of	  Speech,	  Language	  and	  Hearing	  Research,	  doi:	  10.1044/2013_JSLHR-­‐S-­‐12-­‐0345	  


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items