UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Towards an emotionally communicative robot : feature analysis for multimodal support of affective touch… Cang, Xi Laura 2016

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata


24-ubc_2016_november_cang_xilaura.pdf [ 8.92MB ]
JSON: 24-1.0314100.json
JSON-LD: 24-1.0314100-ld.json
RDF/XML (Pretty): 24-1.0314100-rdf.xml
RDF/JSON: 24-1.0314100-rdf.json
Turtle: 24-1.0314100-turtle.txt
N-Triples: 24-1.0314100-rdf-ntriples.txt
Original Record: 24-1.0314100-source.json
Full Text

Full Text

Towards An Emotionally Communicative Robot:Feature Analysis for Multimodal Support of Affective TouchRecognitionbyXi Laura CangBSc Mathematics, The University of British Columbia, 2007BEd Secondary, The University of British Columbia, 2010BCS, The University of British Columbia, 2014A THESIS SUBMITTED IN PARTIAL FULFILLMENTOF THE REQUIREMENTS FOR THE DEGREE OFMaster of ScienceinTHE FACULTY OF GRADUATE AND POSTDOCTORALSTUDIES(Computer Science)The University of British Columbia(Vancouver)August 2016c© Xi Laura Cang, 2016AbstractHuman affective state extracted from touch interaction takes advantage of nat-ural communication of emotion through physical contact, enabling applicationslike robot therapy [88, 104], intelligent tutoring systems [25], emotionally-reactivesmart tech, and more. This work focused on the emotionally aware robot pet con-text and produced a custom, low-cost piezoresistive fabric touch sensor at 1-inchtaxel resolution that accommodates the flex and stretch of the robot in motion.Using established machine learning techniques, we built classification models ofsocial and emotional touch data. We present an iteration of the human-robot inter-action loop for an emotionally aware robot [110] through two distinct studies anddemonstrate gesture recognition at roughly 85% accuracy (chance 14%).The first study collected social touch gesture data (N=26) to assess data qualityof our custom sensor under noisy conditions: mounted on a robot skeleton simulat-ing regular breathing, obscured under fur casings, placed over deformable surfaces.Our second study targeted affect with the same sensor, wherein participants(N=30) relived emotionally intense memories while interacting with a smaller sta-tionary robot, generating touch data imbued with the following: Stressed, Excited,Relaxed, or Depressed. A feature space analysis triangulating touch, gaze, andphysiological data highlighted the dimensions of touch that suggest affective state.To close the interactive loop, we had participants (N=20) evaluate researcher-designed breathing behaviours on 1-DOF robots for emotional content. Resultsdemonstrate that these behaviours can display human-recognizable emotion asperceptual affective qualities across the valence-arousal emotion model [83]. Fi-nally, we discuss the potential impact of a system capable of emotional “conversa-tion” [89] with human users, referencing specific applications.iiPrefaceI have been fortunate to have been able to collaborate with a number of peoplethroughout the studies described in Chapters 2 – 4. For each work, I describe thenature of my role and recognize my colleagues for their contribution.Chapter 2 (Gesture Classification) is a conference paper published at Interna-tional Conference on Multimodal Interaction (ICMI 2015) 1on touch gesture clas-sification and is presented here in its entirety. My co-authors include labmates PaulBucci (fellow MSc student), Andrew Strang (Research Engineer), Jeff Allen (PhDstudent), Sean Liu (summer undergraduate research assistant), and supervisor Dr.Karon MacLean. The study concept, analysis procedures, and full paper writingwere done by me. However, collaborators provided regular feedback and helpedrefine my original study proposal, informing the resulting study design and analy-sis process. Furthermore, Paul and Sean worked with me to conduct the actual datacollection for the latter half of the study. I carried out final editing in conjunctionwith Paul and Karon.A version of the paper intended for an upcoming journal submission formsChapter 3 (Affect Detection). The featured study describes the collection and clas-sification emotion data from gaze and biometric support of touch data. This is againa collaborative effort between our Sensory Perception and Interaction (SPIN) laband Dr. Jussi Rantala, a post-doc visitng from the University of Tampere in Fin-land at the time of the study. The author list consists of myself, Dr. Jussi Rantala,labmate Paul Bucci, and supervisor Dr. Karon MacLean. My independent contri-1Cang XL, Bucci P, Strang A, Allen J, MacLean K, Liu HY. Different strokes and different folks:Economical dynamic surface sensing and affect-related touch recognition. In Proceedings of the2015 ACM on International Conference on Multimodal Interaction 2015 Nov 9 (pp. 147-154). ACM.iiibution to this project includes the original study proposal, all touch-related back-ground research and materials for data collection, the integrated multimodal anal-ysis, and the final paper writing. Again, our team met regularly to collaborativelyrefine and improve the study design, enlisting the expertise of Dr Jessica Tracy ofUBC’s Emotion Lab before deciding on an emotion elicitation methodology. Jussiand I wrote the logging software for integration of gaze and touch data together;he brought gaze expertise, and I touch. The two of us then ran participants to col-lect multimodal emotion data. During the analysis phase, each of Jussi, Paul, and Iwere responsible for data cleaning of the three data sets: gaze, biometric, and touchrespectively. We met weekly via Skype to discuss data format, cleaning techniques,and final feature set calculations, sharing our findings and data sets. Classificationscripts to generate results were pair programmed by myself and Paul. I draftedthe initial manuscript, although the entire team was involved in multiple editingpasses. Chapter 3 includes all of our results and findings; while not the exact finaljournal text, it will bear a close resemblance.The robot behaviour recognition of Chapter 4 (Behaviour Sketching) is an earlystudy that has informed current work on an expanded set of robot behaviour gener-ation and interpretation. The content here has not appeared elsewhere and will beincluded in a larger work, as the first of a sequence of design studies. It was alsoa collaboration between myself, visiting PhD student Merel Jung of the Universityof Twente in the Netherlands, Dr. Jussi Rantala, and Paul Bucci. Study designrefinement was a full group effort built on ideas generated by Merel and myself.Behaviour creation and participant materials were designed by myself, Jussi, andMerel with final study interface coding done in pair programming fashion by my-self and Jussi. Robot hardware is attributed to Paul Bucci alone. Data analysiswas planned and conducted by myself and Merel, and while I took lead on writing,edits were made iteratively by the whole team.This work was approved by the University of British Columbia (UBC) Be-havioural Research Ethics Board (BREB) with ethics number #H15-02611. Studyforms can be found in Appendix A.Finally, I take responsibility for the concept of this thesis, chapter integration,and all other formal writing, including any and all errata.ivTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xGlossary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xivAcknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xvi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.1.1 Robot platforms . . . . . . . . . . . . . . . . . . . . . . 31.1.2 The CuddleBot and CuddleBits . . . . . . . . . . . . . . 71.1.3 Sensing technology and classification techniques . . . . . 71.1.4 Interaction Styles . . . . . . . . . . . . . . . . . . . . . . 81.2 Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.3 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . 112 Gesture Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 13v2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.1.1 Questions and Contributions . . . . . . . . . . . . . . . . 152.1.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . 162.1.3 Detailed Requirements . . . . . . . . . . . . . . . . . . . 172.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.2.1 Social & Affective Touch Communication . . . . . . . . . 182.2.2 Flexible Pressure-Location Sensors . . . . . . . . . . . . 192.3 Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.3.1 Apparatus . . . . . . . . . . . . . . . . . . . . . . . . . . 202.3.2 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 212.3.3 Analysis and Results . . . . . . . . . . . . . . . . . . . . 232.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332.6 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 342.7 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 353 Affect Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.1.1 Approach and Research Questions . . . . . . . . . . . . . 403.1.2 Contributions . . . . . . . . . . . . . . . . . . . . . . . . 423.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 433.2.1 Emotion Set . . . . . . . . . . . . . . . . . . . . . . . . . 433.2.2 Emotion Elicitation . . . . . . . . . . . . . . . . . . . . . 443.2.3 Modalities . . . . . . . . . . . . . . . . . . . . . . . . . 453.3 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 473.3.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . 473.3.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . 513.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613.4.1 Pressure-location Feature Set: Emotion Classification . . . 613.4.2 Frequency Domain Feature Set: Emotion Classification . . 643.4.3 Participant Classification . . . . . . . . . . . . . . . . . . 673.4.4 Feature Set Analysis . . . . . . . . . . . . . . . . . . . . 673.4.5 Experienced Emotion Trajectory and Interview Results . . 70vi3.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 743.5.1 Findings . . . . . . . . . . . . . . . . . . . . . . . . . . 743.5.2 Experimental Methodology . . . . . . . . . . . . . . . . 753.5.3 Application Implications and Future Work . . . . . . . . . 773.6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794 Behaviour Sketching . . . . . . . . . . . . . . . . . . . . . . . . . . . 814.1 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 824.2 Experiment Method . . . . . . . . . . . . . . . . . . . . . . . . . 834.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 854.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 885 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . 915.1 Outcomes and Impacts . . . . . . . . . . . . . . . . . . . . . . . 925.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95A Supporting Materials . . . . . . . . . . . . . . . . . . . . . . . . . . 107A.1 Study Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107A.1.1 Consent Form . . . . . . . . . . . . . . . . . . . . . . . . 107A.1.2 Call for Participation Form . . . . . . . . . . . . . . . . . 110A.2 Participant Response Forms . . . . . . . . . . . . . . . . . . . . . 112A.2.1 Gesture Study Demographic Questionnaire . . . . . . . . 112A.2.2 Affective Rating Form . . . . . . . . . . . . . . . . . . . 115A.2.3 Robot Behaviour Interview Script . . . . . . . . . . . . . 117viiList of TablesTable 1.1 An overview of robot platforms intended for therapeutic use. . 5Table 2.1 Touch gesture instructions as provided to participants. . . . . . 22Table 3.1 Experimental Procedure Summary . . . . . . . . . . . . . . . 52Table 3.2 Summary of the condition sets for Neutralization and Emotiontasks. For example, one participant session consisted of R1 neu-tral task, Stressed emotion task, R2 neutral task, and Relaxedemotion task. . . . . . . . . . . . . . . . . . . . . . . . . . . . 54Table 3.3 Touch gesture instructions as provided to participants [19]. . . 54Table 3.4 A motivating overview of analysis techniques. . . . . . . . . . 58Table 3.5 A motivating overview of analysis conditions. . . . . . . . . . 60Table 3.6 Results from Classifying Emotion using 20 fold CV on pressure-location features for Touch (T), Touch + Gaze (T+G), and Touch+ Gaze + Biometrics (T+G+B) . . . . . . . . . . . . . . . . . 63Table 3.7 Results from Classifying Emotion using 20 fold CV on TouchFrequency (TF); Touch Frequency+Touch pressure-location (TF+T);Touch Frequency+Touch pressure-location+Gaze Frequency (TF+T+GF)feature sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . 66Table 3.8 Results from Classifying Participant using 20 fold CV on pressure-location features for Touch (T), Touch+Gaze (T+G), Touch+Bio(T+B), and Touch+Gaze+Biometrics (T+G+B) . . . . . . . . . 68viiiTable 3.9 Results from Classifying Participant using 20 fold CV on TouchFrequency (TF); Touch Frequency+Touch pressure-location (TF+T);Touch Frequency+Touch pressure-location+Gaze Frequency (TF+T+GF)feature sets. . . . . . . . . . . . . . . . . . . . . . . . . . . . 69ixList of FiguresFigure 1.1 One complete iteration of the Human-Robot Interaction loop [110]:(1) Human expresses emotion; (2) Robot recognizes Humansignal; (3) Robot expresses reaction to interpreted human ex-pression; (4) Human recognizes Robot expression . . . . . . . 2Figure 1.2 A collection of therapy robots designed to study the socialand physiological impacts of robots-as-companions on humanlives: (a) Paro; (b) Huggable; (c) Probo; (d) Haptic Creature;(e) CuddleBot. . . . . . . . . . . . . . . . . . . . . . . . . . 4Figure 2.1 (a) Top view of the CuddleBot skeleton. (b) Touch sensor,pinned to foam substrate wrapped around the skeleton and cor-responding to a No touch × No motion × No cover condi-tion. (c) Full range of breathing motion used. (d) The fully-covered robot; a covering of nearly identical material was usedin the study to facilitate quick condition changes. (e) The fab-ric pressure sensor constructed out of EeonTex conductive fab-ric 〈www.eeonyx.com〉, wired to an Arduino microprocessor. 16Figure 2.2 Mean gesture prediction accuracy rates with added pressurenoise when (a) varying substrate or cover in Study 1 and (b)varying motion and cover on the same curved structure as inStudy 2. Each bar represents an average accuracy rate over 10trials; error bars are omitted as ∆ across trials < 0.001% ineach case. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24xFigure 2.3 A modified Hinton confusion matrix for gesture classification.Horizontal (row) gestures are classified as the vertical (col-umn) gesture. Saturation in non-diagonal squares representsnumber of misclassifications. . . . . . . . . . . . . . . . . . . 26Figure 2.4 Cohen’s d effect sizes of participant by gesture for each study. 28Figure 2.5 Mean subject recognition rates by gesture and study over 10trials; error bars omitted as ∆ across trials < 0.01% in eachcase. Study 2b refers to the ‘in-motion, with cover’ condition. 30Figure 2.6 Top features as selected by Weka for each study. Classifica-tion tasks are Gesture and Subject, by Location (Cx,Cy) andPressure features. Features selected at under 25% frequency in20-fold cross validation are omitted. . . . . . . . . . . . . . . 32Figure 3.1 The experiment setup: participant sits comfortably supportedby pillows, facing the gaze tracker with her hand on the touch-sensitive surface of a stationary robot. Biometric sensors areworn around the waist (respiratory rate), thumb (blood-volumepulse), and index and ring fingers (skin conductance) of theresting hand. One camera captures eye movements and an-other is raised on a tripod behind the participant to capturehand motions over the robot. Both cameras have audio dis-abled for privacy. When the participant pulls the rope, a ballleading outside the room indicates to experimenters that theemotion task is complete. . . . . . . . . . . . . . . . . . . . . 48Figure 3.2 Our robot was constructed from pliant plastic sheets actuatedby a pulley, covered with a custom-built sensor, and finallywrapped in a furry fabric to invite touch. It remained station-ary during the course of this study to eliminate any effect ofparticipant reaction to robot motion. . . . . . . . . . . . . . . 50xiFigure 3.3 The data recorded during a single emotion task is referred toas a capture. To determine the effect of varying window sizeon accuracy rates, we tested at 2s, 1s, 0.5s, and 0.2s windows.Assuming that there may be similarities learned when trainedand tested on directly adjacent windows, we also compared ac-curacy rates of data with and without imposed gaps to removeadjacent window instances. . . . . . . . . . . . . . . . . . . 56Figure 3.4 Classification performance by modality set, level of systemknowledge (where labels in indicates with knowledge), win-dow size, and with/without gaps. The Emotion row displaysresults from classifying emotion, with and without Participantlabels. The Participant row displays results from classifyingparticipant, with and without Emotion labels. For both cases,increasing the number of modalities improves accuracy rates,where the inclusion of biometrics provides the strongest clas-sification rates. However, in the most rigorous test of emotionclassification, Leave-one-out (LOO), all modalities perform ator near chance for all window sizes. Regardless of classifica-tion task, window size has a small or positive effect on accu-racy rates, except for case of 2s-windows-with-gap. This islikely due to a high decimation of data, i.e. 2s-windows-with-gap has 13% of the number of data instances as 1s-windows-with-gap. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62Figure 3.5 This is a bargraph of accuracy rates for every LOO test by par-ticipant. Most of the accuracy rates are roughly chance=25%,and low SD suggests the variance of this classification was lowacross window sizes. Notice a few interesting outliers: test setsincluding P22 and P05. P22 has much higher classificationrates, suggesting that P22 may be similar enough to the groupto have their emotion behaviours consistently identified. Con-versely, P05’s particularly low rate of classification suggeststhat P05 expresses emotions contrary to the group. . . . . . . 65xiiFigure 3.6 Feature selection popularity of pressure-location features ofTouch and Frequency features of Touch and Gaze. . . . . . . 67Figure 3.7 Feature selection popularity by statistic. Pressure-based touchfeatures are most popular, followed by location-based touchand gaze frequency features on location data also third, whennormalized against total number of features available by modal-ity. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71Figure 3.8 Changes in individual’s self-report of emotion after Neutral-ization (start) and Emotion tasks (finish); N=14 for Stressed &Relaxed and N=16 for Depressed & Excited. Overall, we see amove from the origin to the representative quadrant. Stressedand Excited show the strongest overall change along both Arousaland Valence axes. Relaxed shows the least change with dis-connected points referring to “no change” from neutral state. . 73Figure 4.1 The rigid RibBit (Left) and fur-covered FlexiBit (Right) ex-plore very different form factors using similar actuation princi-ples and requirements. Both can be compressed without dam-age, allowing for a more naturalistic haptic display. . . . . . . 82Figure 4.2 Experimental setup showing a participant touching the Flex-iBit and rating behaviours. The screen’s quadrants present thefour situation descriptions. . . . . . . . . . . . . . . . . . . . 84Figure 4.3 Close-up of the interface participants used to rate behaviours. 85Figure 4.4 Waveforms of behaviours as designed by researchers. . . . . . 86Figure 4.5 Mean behaviour ratings for FlexiBit grouped by the researcher-designed behaviours (horizontal) and the situation for whichthe behaviours were rated by participants (vertical). Researcher-designed behaviours correspond with (a) to (h) in Fig. 4.4. . . 87Figure 4.6 Pairwise comparison p-values (Wilcoxon) of behaviours (row)for different situation conditions (col), sig. diff. are darker.Notice RibBit(S-E, D): in the Depressed condition, Stressedand Excited were rated significantly differently. . . . . . . . . 88xiiiGlossaryBRV Breathing Rate VariabilityBVP Blood Volume PulseDIY Do It YourselfDOF Degree-of-FreedomFFT Fast Fourier TransformFSRS Force Sensing ResistorsHRI Human-Robot InteractionHRV Heart Rate VariabilityKS Kolmogorov-SmirnovLOO Leave One OutRR Respiratory RateSC Skin ConductancexivAcknowledgmentsI am indebted to many people for their support.Many thanks are owed to my supervisor, Dr. Karon MacLean, for believingin me and inviting me to join the SPIN lab, an experience which has facilitatedso much learning and success not least of which are the relationships that havedeveloped.I am grateful to my second reader, Dr. Giuseppe Carenini, for the insightfulcomments and questions - this is a better work for it.And to my labmates, collaborators, and friends: Paul Bucci, Jussi Rantala,Oliver Schneider, Hasti Seifi, Matthew Chun, Merel Jung, Soheil Kianzad, DilanUztek, Meghana Venkatswamy, David Marino, Alicia Woodside, Lucia Tseng, BenClark, Sazi Valair, Jeff Allen, Andrew Strang, and Michael Phan-Ba. I have muchappreciated the valuable feedback, and thank you all for making the SPIN lab sofun!Of course, I owe so much to my family: my mom, Hui Wang, and my sister,Alice Cang, for their unending love and support.Finally, I am forever glad for my partner, Oliver Trujillo, for standing unwa-veringly in my corner. Thank you for being my best friend.This work was funded by the Natural Science and Engineering Research Coun-cil of Canada.xvDedicationTo my mom, who has always demonstrated that love is so much more than words.Thank you.xviChapter 1IntroductionCommunication between people is made richer with affective clues. We pick up oneach other’s eye contact, vocal inflections, body language, touch behaviour as muchas we do with words. Machine recognition of these kinds of cues improves thequality of Human-Robot Interaction (HRI) in collaborative tasks [36], intelligenttutoring systems [48], assisted driving [31], and the list goes on.Using machines to recognize social touch specifically leverages natural humaninclinations to express emotional closeness through physical contact and is gainingattention within affective computing [20, 33, 51, 93, 110]. As interpersonal touchencodes significant emotional content [43], investigating machine-sensed socialtouch is a stepping stone toward real-time emotion detection. One such applica-tion: the therapy robot pet takes advantage of the emotional communication intouch between human-animal without having to first address many of the com-plexities elicited in human-human interactions [78]. Therapy animals have longbeen shown to have physiologically measurable benefits on patients [10, 24]; forthose who are unable to maintain long-term contact with pets—allergies, anxiety,cost—therapy robots have been employed with surprising success [46, 54, 103].With better affective prediction, we can develop more naturally reactive therapeuticrobots approaching that of a touch-based human-animal interaction loop as definedby Yohanan et al (2012) (see Figure 1.1).Currently, there are obstacles to developing affective social touch recognitionbetween a human and an animal-like robot pet, the biggest of which asks what pa-1Figure 1.1: One complete iteration of the Human-Robot Interactionloop [110]: (1) Human expresses emotion; (2) Robot recognizes Hu-man signal; (3) Robot expresses reaction to interpreted human expres-sion; (4) Human recognizes Robot expressionrameters of touch behaviour conveys emotional content and whether or not touchsensors can capture it. We focus here on those we think are significant and logi-cal steps towards emotional communication in robot pet therapy. First, the robot’stouch sensitive skin must flex with motion yet maintain data integrity under defor-mation. Second, classification of affective state in touch is less mature relative tosome other modalities [4] and has fewer established techniques. Integrating touchwith multiple channels known to contain affect increases our confidence in emo-tion detection. Third, even once affect classification is solved, our interaction is notcomplete without robot response. Researchers have studied emotions as exhibitedin the behaviour of many different animals [12], however it’s unclear if a robot sim-ulation of affective expression is still human-identifiable as emotion. Meanwhile2(fourth), we must be mindful of real-time viability throughout the entire interactivecycle – from sensing human touch, to predicting affective state, to developing ap-propriate robot responses – and consider ways to trim computational cost whereverpossible.In this thesis, we focus on the therapy robot pet application and describe a cus-tom sensing mechanism as a method for extracting touch data, supported by threedistinct studies. In the first, we compared classification of touch gestures under avariety of deformation conditions, allowing us to recommend a set of conditionsthat balances user preference with data quality. The second study collects emotivetouch data, supported by gaze and biometric sensors. The multimodal approachhelps us understand how users express emotion as well as compares affect classifi-cation accuracy between touch alone vs with support; this chapter concludes withdesign recommendations for an affective robot pet. To close the loop, we investi-gate emotional expressiveness in robot breathing where participants identify robotexpression of affect while interacting with small single Degree-of-Freedom (DOF)robots performing a variety of breathing patterns. Finally, we outline the outcomesand impacts from this body of work and ground our findings in future work forfurthering the therapeutic robot pet.1.1 BackgroundHere we describe the bigger-picture background that leads to us to this work. Morefocused literature reviews are included in each forthcoming chapter.1.1.1 Robot platformsMultiple robot form factors attempt to imitate animal therapy success though itis not yet clear which characteristics generate the measurable physiological im-provements in cardiopulmonary pressures, neurohormone levels, and anxiety inpatients [24]. Some are plush, cuddly versions of larger mammals like Paro, aseal, or the Huggable, a teddy bear; others have no Earth-born analogue, like thefurred, green Probo with a long articulated nose or the Haptic Creature, a round,furry object reminiscent of a cat/rabbit hybrid. Each robot has distinct sensory andactuation capabilities (see Table 1.1), but all are designed to allow for common pet3Figure 1.2: A collection of therapy robots designed to study the social andphysiological impacts of robots-as-companions on human lives: (a)Paro; (b) Huggable; (c) Probo; (d) Haptic Creature; (e) CuddleBot.interactions, including stroking and hugging.4Table 1.1: An overview of robot platforms intended for therapeutic use.Robot Modalities Actuation(DOF)Special Features Form SurfaceParo [46, 90]sight (light v dark) neck (2) weighs 2.7kgbaby harp seal soft white fursound (direction andspeech)front paddle (1) pacifierbalance rear paddle (1) reacts to stroke and hittouch ∼2inch taxels eyes (2) pacemaker-friendlyHuggable [94, 95]touch: temperature,force, electric fieldneck (3) recognizes 9 gesturesteddy bear soft butterscotch fursight: video camera ineyeseyebrows (2) wireless connectivitysound: microphone shoulders (2) speaker for audio out-putbalance: Inertial mea-surement unitears (1)Probo [86]sight: digital camera eyebrows (4) wifi-enabled touchscreenelephant-caricature soft green feltsound: microphone trunk (3) classifies 3 touchestouch: position sensors,temperaturemouth (3) facial recognitionhead (3) speech analysiseyes(3)eyelids (2)ears (2)Haptic Creature [108]touch: 60 FSRs overentire bodyears (2) fiberglass shellcat/rabbit-like form soft brown synthetic furbalance: internal ac-celerometerbreathing (1)purr (1)CuddleBot [3, 19]touch: 256 taxel fabric head (2) WIFI-enabledguinea pig-like form soft minkybalance: accelerometerand gyroscoperibs (1) 3D printed skeletonspine (1)purr (1)5Paro The choice of emulating a baby harp seal exploits cuteness and recognizabil-ity, without being too familiar and pre-empting users expectations of behaviour [104].Paro is fully autonomous with tactile sensing systems that allow it to recognizewhether it is being stroked or hit, adjusting behaviour as appropriate [46]. The cus-tom ubiquitous tactile sensors employed on Paro [90] are pictured as large palm-sized pressure sensors and placed where touch contact is most likely – the headand back, while avoiding more difficult to manage locations like the joints. Use ofParo as a companion in care homes for senior citizens has been well-documented,particularly as a comfort animal surrogate for dementia patients [21, 35].Huggable The Huggable is primarily designed for pediatric care and is intendedas an augmented comfort object for children experiencing the stress of hospitaliza-tion [96]. An accompanying web-based logging program can be used to monitorthe patient’s distress levels through video and audio channels, as well as touch orbehavioural data. The web interface also enables an operator, such as a therapist orcaregiver, to control Huggable’s many actuators and react to a patient’s behaviour.Probo Probo, also intended for engagement with children, relies mainly on facialexpression of emotion. The cartoon elephant sports an articulated trunk as well asan interactive touch screen installed in the belly [86]. Like the Huggable, Proborequires an operator to communicate with the child where interaction is intendedto evolve into a friendship [38].Haptic Creature The Haptic Creature was developed to highlight touch-based in-teractions in social robot therapy, reducing visual cues as much as possible [108].While there are multiple actuation avenues, simple regular breathing motions fromthe lifting fiberglass body plates is enough to elicit significant calming effects asdemonstrated by biometric measures of reduced heart and breathing rates [88].All of these robot platforms target therapy and are equipped with impressivekinematic abilities; our goals for complex and autonomous affective touch commu-nication, however, require more sophisticated sensing and processing of affectivetouch signals. In fact, only Paro is actually intended and equipped to react to usersin the absence of a human interpreter and operator. Our desired interaction, how-ever, assumes a larger and more complex emotion set necessitating comprehensivefull-body touch sensing as well as an embedded touch prediction engine.61.1.2 The CuddleBot and CuddleBitsTo build upon the developments of the Haptic Creature, other members within thelab created the CuddleBot, a new platform for study in therapy robot [3]. Improve-ments include increasing the touch-sensitivity to cover the entire body, more real-istic motion with a 3-D printed, articulated full skeletal structure on 5-DOF, Wi-Ficonnectivity for use with a web interface to define and adjust behaviours. The corestructure is driven by modular, centrally positioned, rod-driven actuators as robotdesigners were mindful of keeping the kinematics easy to modify, enabling quickexploration into design alternatives. The CuddleBot, like Paro, is designed to befully autonomous, but with a more complex on-board real-time affect interpretationengine able to interpret emotional expression in touch and reacting accordingly.Many engineering parameters were considered when designing the CuddleBot,including designing for full modularity so that each degree of freedom could bedeveloped and studied independently. To explore expressivity of individual actu-ation methods, a family of single DOF robots, dubbed the CuddleBits, was cre-ated [18]. Internal tools were also developed, enabling the sketching and fine-tuning behaviours to investigate the perception of emotionally expression of thesesimple machines [15].For either robot platform, the affective prediction model, its mapping to a setof robot responses, and the human perception thereof remained undeveloped.1.1.3 Sensing technology and classification techniquesClassification procedures depend on polling rate, resolution, format, and expectedpurpose of the data collected. If we only intend to distinguish between two touchgestures, say stroke and hit as in the case of Paro [46], the larger ubiquitous tactilesensor [90] in conjunction with a prediction system that detects pressure thresholdis sufficient for use. For classification of more nuanced touch, however, not only aremore taxels of higher resolution required, but also more sophisticated processing.SensingHigh resolution sensors, such as those employed in robot grippers, are well-developed for very precise, dextrous manipulation and are successful for telema-nipulation or even robot-assisted telesurgery [63]. However, this level of precision7far exceeds our needs for affective or social touch gesture interpretation, and wouldbe excessively expensive, both financially and computationally.A survey of other touch sensors including Force Sensing Resistors (FSRS), grid-based pressure sensors (available at www.plugandwear.com), and fabric stretchsensors (www.vista-medical.com) revealed that they either must be affixed to arigid substrate, were not capable of multi-dimensional stretch, or did not reflectmulti-contact touch. Other requirements include fully-stretchable, full-body con-tinuous taxel coverage, while being financially and computationally cheap. Un-fortunately, existing sensors could not satisfy all criteria (requirements and touchsensors described in more detail in Section 2.1.3 and Section 2.2.2 respectively).ClassificationClassification procedures that make use of both pressure and location param-eters have performed at up to 94% accuracy (chance 11%) with random forestmodels built on gestural touch data [33].In contrast, machine recognition of affective touch alone does not performnearly so well, with results at up to 48% accuracy (chance 11%) [4]. By enhancingunimodal touch data with information from other comparably nonverbal sources,however, we may be able to do much better.Affective state directly influences physiological measures [32, 62] and has beenclassified with accuracies as high as 95% (chance 25%). Tracking gaze behaviourhas also been promising in determining emotional state and attention or inter-est, leading to improvements for intelligent tutoring systems [48] and educationalgames [25], to name a few.Affect recognition in touch may benefit from integrating multimodal supportof gaze and biometric channels, both of which are associated with commerciallyavailable sensors and feature extraction software. As far as we know, there doesnot currently exist a study that triangulates even two of these three signals.1.1.4 Interaction StylesTo complete the touch interaction loop, we require that on-board robot sensing,affect classification and motion response work together.Looking at 1-DOF at a time, we begin by focusing on breathing behaviours due8to its range of emotional expressivity [13] as well as its proven ability to reducesymptoms of stress [88]. Section 4.1 delves into greater detail on robot behaviourcomplexity and believability.We consider possible designs for the interaction model in two relevant interac-tions types, as defined by Sharp [89]: Instructing – where users issue instructionsto a system – and Conversing – where users have a dialog with a system.Instructing interactions can be conceived of as a continuously listening Robotwaiting for Human direction. Once the recognition engine provides a prediction,the Robot performs the appropriate mapped behaviour and returns to a listeningphase – completing one iteration of Yohanan’s interaction loop (see Figure 1.1)where each iteration is completely independent. Producing appropriate responsesrequires that our prediction system perform with high accuracy and be able togauge user intent with little room for error.Conversing interactions are more suited for affective response: the Robot is con-tinuously listening for Human input whereupon touch input triggers a prediction,generating a Robot behaviour. Upon detecting Human reaction to the earlier re-sponse, the Robot reflects on the previous behaviour and decides either to corrector continue the reaction.While many human-human or human-animal interactions are well-adapted toinstructing interactions, emotional communication is much better described in con-versing style. Whether engaging with other humans or with animals, we assumesome rate of error and are continuously evaluating and correcting behaviour basedon increasing information. Although this more realistic model is outside the scopeof this thesis, we lay the groundwork for modelling this kind of emotional human-robot conversation.In either interaction style, interesting and believable human-robot interactionsnecessitate a large and complex behaviour set where each behaviour is human-recognizable as an emotional response. Unfortunately, generating emotional robotbehaviours is difficult without more insight into the perception of platform-specificrobot behaviours.91.2 ApproachThe overarching goal of this thesis is to explore and improve affect recognition intouch behaviour and expand its use for a variety of emotionally intelligent applica-tions. We focus on social robot therapy because this application is easy to motivatefor participants and naturally elicits human emotional expression.To satisfy the sensing needs of a fully touch-sensitive robot pet in motion, wecustomized a Do It Yourself (DIY) fabric touch sensor of 1-inch taxels in a 10×10array to be consistent with human social touch behaviours in terms of pressurerange and polling rate [92]. The sensor reports pressure and location dimensionsas required for gesture recognition [33] and is used in both gestural and affectivetouch studies (sensor construction is fully detailed in Section 2.3.1).The literature reflects an assumption that there is emotional encoding withingestures [4, 33] and recognition engines of other robot platforms build on this scaf-fold. For example, Paro responds to stroke positively and hit negatively [46], andthe Huggable and Probo respectively detect nine [96] and three [86] touch ges-tures to react to emotionally. While we acknowledge this is a reasonable approach,we choose to explore an independent relationship between gestural and emotionbehaviour, electing instead to perform distinct collection procedures and reportclassification results separately.Machine recognition of affect in touch remains largely unsolved: we have notyet reliably pinpointed the aspects or artifacts of touch that decode emotional state.However, since emotions are experienced multisensorially, we can borrow from theclassification techniques of more mature sensing channels and introduce a multi-modal approach by integrating gaze and biometrics support with touch. We thencompare touch-only performance with touch plus additional modalities and discussthe use of multimodal sensing across a number of applications.Finally, we evaluate the expressivity of breathing by designing a number ofbehaviours on the 1-DOF CuddleBit robots and compiling user reactions to them.By evaluating each stage of the interaction loop independently, we isolate require-ments and limitations, gaining a deeper understanding of how to engage in the fullHRI touch interaction loop. While out of scope of this thesis, these results be-gin to extend naive loop iterations to smarter behaviours with error correction and10improve the HRI experience.1.3 Thesis OrganizationThe remainder of the chapters of this thesis are organized by the studies conducted.In Chapter 2 (Gesture Classification), we describe the sensor used in both touchinteraction studies and perform gesture classification. A major goal of this studywas to assess impact on recognition performance of sensor motion, substrate andcoverings. We collected six gestures most relevant in a haptic social robot contextplus a no-touch control (N1 = 10, N2 = 16) and ran classification on a random forestmodel of 100 trees using Weka, an open-source machine learning program. Resultsallowed us to conclude that under realistic conditions (CuddleBot in motion, withfoam substrate, and under a fur cover), recognition using our custom sensor issufficient for many applications of social touch including affective or functionalcommunication, from physically interactive robots to any touch-sensitive object.Chapter 3 (Affect Detection) describes our multimodal collection and classifi-cation of emotional expression in touch (measuring force magnitude or pressureand location), gaze (location), and biometric (skin conductance, blood volumepulse, respiration) data. We asked participants (N=30) to relive intense emotionalmemories while touching a stationary, furry robot, eliciting authentically experi-enced emotions under controlled lab conditions. Each participant interacted withthe robot while experiencing two opposing emotions: Stressed-Relaxed or De-pressed-Excited. To better understand the data, we partition by level of systemknowledge of participant information [labelled, unlabelled, leave-out] and acrosstime-varying windows [2s, 1s, 0.5s, 0.2s], we found classification accuracy ratesimprove with increasing system knowledge of participant. Adding modalities ofgaze and biometric data improve accuracy only when a participant’s instances arepresent in both training and test sets. To address the computational efficiency re-quired of dynamic and adaptive real-time interactions, we analyze relative subsetsof our multimodal feature set, and provide design recommendations for our thera-peutic robot.The last stage of a single loop iteration requires robot reaction, investigated inChapter 4 (Behaviour Sketching). We describe our design of breathing behaviours11via direct waveform modification, displayed on the CuddleBits, one flexible, furryFlexiBit and the other a rigid, wooden RibBit. Emotive behaviours were designedby experimenters and interpreted by N=20 participants based on an arousal/valenceemotion grid, specifically Excited, Stressed, Relaxed, and Depressed. Our findingsindicate that these simple robots haptically conveyed emotion with success similarto that of more complex systems.Finally, Chapter 5 (Conclusion) highlights the outcomes and impacts of eachstudy and describes future work to improve the behaviour design in a human-robotconversation.12Chapter 2Gesture ClassificationEarlier robot constructions such as the Haptic Creature have used FSRS as the touchsensing system [109] requiring a firm, plexiglass construction that was heavy andoffered insufficient degrees of freedom for convincing, complex motion. A fullredesign yielded the CuddleBot [3] whose light 3D-printed skeleton necessitatedan even lighter-weight sensing system. We needed a touch-sensitive skin thatcould flex around multi-DOF motion without losing data integrity, kept a low-computation profile to facilitate real-time processing, and was inviting to touch.No sensor quite fit all of our requirements until we found inspiration in makerculture. Following existing DIY-sensor guidelines [77], we did a custom modifi-cation using commercially available conductive fabrics 1in order to capture bothpressure and location dimensions required for gesture recognition [33]. Whilethis sensor is not presented as a contribution, we highlight its development in thisChapter. The same sensor is used to collect touch data in Chapter 3.This chapter features a conference paper previously published at the ACM In-ternational Conference for Multimodal Interaction (ICMI’15) [19], presented herein full. Prior to attempting the more complex affect recognition, it was necessary toverify that the use of our custom fabric sensor in gesture recognition tasks demon-strated results consistent with the literature. This paper also highlights the noiseconcerns when engaging the CuddleBot robot in real-use. We present results fromdata collected under increasingly difficult conditions (stationary to moving; firm,1Sensor fabric purchased from 〈www.eeonyx.com〉13flat substrate to soft, curved foam; no cover to high density fur) and recommend aconfiguration that balances performance with user preferences.AbstractSocial touch is an essential non-verbal channel whose great interactive potentialcan be realized by the ability to recognize gestures performed on inviting sur-faces. To assess impact on recognition performance of sensor motion, substrate andcoverings, we collected gesture data from a low-cost multitouch fabric pressure-location sensor while varying these factors. For six gestures most relevant in ahaptic social robot context plus a no-touch control, we conducted two studies, withthe sensor (1) stationary, varying substrate and cover (n=10); and (2) attached to arobot under a fur covering, flexing or stationary (n=16).For a stationary sensor, a random forest model achieved 90.0% recognitionaccuracy (chance 14.2%) when trained on all data, but as high as 94.6% (mean89.1%) when trained on the same individual. A curved, flexing surface achieved79.4% overall but averaged 85.7% when trained and tested on the same individual.These results suggest that under realistic conditions, recognition with this type offlexible sensor is sufficient for many applications of interactive social touch. Wefurther found evidence that users exhibit an idiosyncratic ‘touch signature’, withpotential to identify the toucher. Both findings enable varied contexts of affective orfunctional touch communication, from physically interactive robots to any touch-sensitive object.2.1 IntroductionWords can sometimes be inefficient for communicating instructions or affectivecontent. In many contexts, touch may be the best modality for conveying direc-tive and emotion: imagine how informing someone to get out of the way quicklyand clearly with one simple touch. To harness this communication channel, socialrobots working in tandem with humans must recognize the same haptic languagethat we use, of which gestures and affect are key components.We focus on exploring the range of touch gestures detectable by a custom-builtflexible fabric pressure sensor and evaluating the added noise from curvature, mo-14tion and material cover. Using common machine learning techniques, we highlightsalient features of touch for recognizing both the type of gesture being performed,and the person performing the gesture—both the ‘touch’ and the ‘toucher’.Reliable gesture recognition is an important step towards further research inthe field of affective touch. A strong foundation of research on gesture may allowus to detect the toucher’s emotional state [50]. Until recently, this kind of researchwas difficult, as touch sensors were not easily deformable nor cheap in both priceand computational resources. Our 10×10 sensor has 100 fingerpad-scale taxelsrecording pressure and 2-D location data, and we use a random forest classificationmethod to approximate in situ recognition rates.We first collected touch data for a set of six validated touch gestures [110]plus one control on a stationary sensor under a variety of substrate stiffnesses andcoverings. We then mounted the same sensor on an actuated robot skeleton andcollected similar data while varying the sensor’s covering and motion (Fig. 2.1).Recognition rates were within 80–95% for all conditions we tested (chance 14.2%),a level of accuracy which will suffice for many purposes and is enough to meritempirical comparison to human recognition ability in future work. At the sametime, we found individuals’ touch signatures were idiosyncratic enough to permitidentification of toucher within this sample, at an accuracy rate similar to that ofthe gestures themselves.2.1.1 Questions and ContributionsWe wished to learn:Q1: How accurate is our flexible fabric sensor in predicting gesture and differen-tiating between users?;Q2: How does sensor performance hold up under deformation due to curvatureand motion, such as that produced by a zoomorphic social robot?; andQ3: Is real-time gesture recognition computationally viable?With 20-fold cross validation on random forest models, we contribute initial resultsof:15Figure 2.1: (a) Top view of the CuddleBot skeleton. (b) Touch sensor, pinnedto foam substrate wrapped around the skeleton and corresponding to aNo touch×No motion×No cover condition. (c) Full range of breathingmotion used. (d) The fully-covered robot; a covering of nearly identicalmaterial was used in the study to facilitate quick condition changes. (e)The fabric pressure sensor constructed out of EeonTex conductive fabric〈www.eeonyx.com〉, wired to an Arduino microprocessor.• deployable accuracy in gesture recognition (6 gestures + control): 91.4% ona firm, flat surface, 90.3% on a foam, curved surface, and 88.4% on a foam,curved, moving surface;• differentiating toucher at 88.8% accuracy (n=26);• factors underlying recognition performance;• feasibility of real-time gesture recognition.We also make our data and analysis publicly available2.Our study compares gesture recognition performance across a variety of con-ditions that approach real-time dynamic gesture recognition. Toucher recognitionaccuracy shows promise for incorporating personalized responses to an individualtouch signature.2.1.2 ApplicationsAccurate gesture recognition on a fabric touch sensor opens up gesture-based con-trols on any electronic device. For example, patients with limited speech coulduse a smart blanket with gesture recognition capabilities for comfort or health-2All collected data and select analysis can be found at 〈www.cs.ubc.ca/labs/spin/data〉16reporting purposes. In the context of social robots, a sensor that can wrap aroundany irregular form could be used as a touch-sensitive skin. Outside of explicit ges-ture recognition, pressure-sensitive hospital sheets could alert caregivers of bedsorerisk.In a behavioural education context, a soft touch-sensing playmate capable ofrecognizing touch signatures may use this data to interpret and influence emotionalstate [50]. Such a robot could aid students testing on the autism spectrum by re-sponding to anxious or agitated strokes with slow, soothing, regulated breathing—abehaviour shown to have calming benefits [88].2.1.3 Detailed RequirementsOur sensing requirements are dictated by a zoomorphic robot, affectionately dubbedthe CuddleBot, that invites touch with a soft furry body. Since a user will expect tointeract with the CuddleBot via touch, having a full-body sensor that deforms withrobot motion is required.Movement and elasticity: The sensor must be highly flexible, somewhat elastic,and perform well while mounted on non-rigid and/or actuated surfaces.Pressure range: Based on a preliminary survey of these touch pressures, wedetermined that our sensor needed to register touches between 0.005 and 1 kg.This range is appropriate for light tickles to heavy pats.Multitouch: Multitouch capability allows us to compute varying pressure overan area, differentiating touches like constant and pat from tickle and scratch.Resolution and computational cost: Taxel resolution, sampling rate, and com-putational cost must be balanced to achieve usable recognition accuracy. For real-time, our computational cost is dominated by sensor polling and grows with thenumber of taxels per grid edge. Our recognition tasks and feature selection explic-itly analyze the differences between frames. In this case, accuracy plateaus withfingerpad-scale taxels, when sampled fast enough to capture voluntary movement(peaking at 10Hz [92]). We must be able to recognize changes in pressure andlocalized hand motions up to this frequency.Single-fingerpad resolution (≈2 taxels per inch) could capture small fluctua-tions; however, our gestures (not including our control no touch) either involve the17flat or palm of hand (constant, pat, rub, stroke), or tend towards quickly cross-ing many taxels (tickle, scratch). This suggests that using statistical features thatemphasized the changes from frame to frame could be used to achieve reasonableclassification rates even at ≈1 inch taxels [33].2.2 Related WorkWe situate our work in the context of social robotics and affect-encoding socialtouch. Gestural touch has been identified as a key component of human-robot co-operation [7]. However, the semantics of that touch is conveyed through nuance.For example, the same gesture could halt, contribute or modify another person’sbehaviour [7] depending on the emotional content inferred from pressure dynam-ics [50].2.2.1 Social & Affective Touch CommunicationIn collaboration with human workers, robots employed in a laboratory or workshopsetting presupposes a lexicon of social touch for operational interactions [36]. Toensure safe and effective communication, Gleeson et al identify the requirements ofboth a comprehensive gestural dictionary and lightweight sensing technology. Theintimate nature of collaborative robotic household help emphasizes the importanceof affect detection for social robots in this context [2, 79].Previous work revealed correlations between gestural social touch and emo-tional communication [44, 50]. Humans recognize the affect encoded in gestu-ral touch [43, 44], suggesting that machine recognition of emotional state can beachieved with sufficient sensing technology and clever feature extraction.Much of the current work on social touch recognition uses a sensor worn ona static human or robotic arm [50–52, 93]. The collected data and signal process-ing procedures may not account for the added deformation noise of a soft-tissuezoomorphic robotic form in motion.The use of animals [24] and interactive robots in animal form (such as Sony’spet-dog AIBO [9, 97], the seal-shaped PARO [46, 64, 91, 104]) suggest potentialbenefits in therapeutic use. Other touchable social robots include the teddy bear-like Huggable [94]; and Probo [85], which does not have a recognizable animal18analogue. However, while real pets respond to complex touch commands anywhereon the body, this has been difficult to achieve without a generalized touch-sensitiveskin.In trying to establish zoomorphic robots as an emotional agent [33, 110], touchsensing strategies have included fur-level conductive threads, extensive biometricdata, gyroscopes and accelerometers, to name a few. While this cavalcade of sens-ing produces encouraging results for social gesture classification [33, 110], it is farfrom the light-weight system required for automatic, real-time recognition.An unexpected result emerging from social touch recognition is the demonstra-bly higher accuracy results for within-subject classification over between-subject[33, 51]. Leveraging this result may allow us to use touch behaviours to identifyindividuals and thus, recognize the nuances of an individual’s “touch signature” tobetter predict touch gestures and, eventually, basic emotional content.2.2.2 Flexible Pressure-Location SensorsReal-time classification of social touch gestures on a flexing, noisy surface requiresthat we have manageable signal processing while retaining the ability to representpressure and location.Here we examine the suitability of existing sensing technology and recognizetheir influence on our custom build. We do not present our sensor as a contribution.While many highly accurate pressure-location sensors exist, such as those de-veloped for robot grippers used in dexterous manipulation [82, 100], these tendto be insufficiently flexible, overkill in terms of resolution, and considerably tooexpensive for the objectives outlined here.Other work has used Force-Sensing Resistors (FSRs) affixed to a hard shell [110].This reduces the need to calibrate for sensor drift over continued use, however,the trade-off non-aesthetic tactility, and difficulty in detecting touches betweensensors—limiting rendered motion [7, 20].Stretch sensors designed for medical purposes by Vista Medical3 is the fore-most inspiration for our custom sensor. However, Vista’s sensors recognized onlypressure without localization and did not have multitouch capability.3Stretchable sensors can be purchased commercially from Vista Medical 〈www.vista-medical.com/subsite/stretch.php〉19Several multitouch, flexible fabric sensors are available [52]. However, flexi-bility alone does not afford a full range of motion; it must be able to stretch anddeform to approximate animal skin.The design and sensing capabilities described by Flagg et al [33] informedmany of our requirements and suggested that the bulk of the recognition accuracycould be achieved by the “below surface” sensor alone. However, Flagg’s study didnot consider the full design space of a robot in motion including a non-sensing furand a variety of configurations. To evaluate how much information is compromisedunder these conditions, we applied a variety of realistic use noise sources to thesensor, both directly and indirectly.2.3 StudiesWe hypothesized that:H1: gesture recognition rates will decrease with noise-creating factors—allowingus to rank these factors’ impact on recognition performance, and their inter-actions therein.H2: variability in gesture execution will be higher between subjects than withinsubjects—giving rise to the potential of differentiating individuals based onpersonal touch signatures.2.3.1 ApparatusWe constructed a sensor by layering two squares of conductive EeonTex4 Zebrafabric, aligned at 90 degrees, with a plastic standoff mesh separator and a sheet ofEeonTex SLPA 20kΩ resistive fabric. Resistance value across a given taxel dropswhen pressure is applied, compressing the mesh separator so the conductive layersmore closely approach each other. A circuit is constructed using an Arduino Megamicroprocessor. Each fabric stripe is connected to a single I/O pin: the top layer isconnected to analog input pins, and the bottom layer is connected to digital outputpins (Fig. 2.1(e)).4Sensor fabric purchased from 〈www.eeonyx.com〉20The sensor is polled by sequentially sending a voltage through the bottomlayer’s digital pins. The analog pins read current; resistance (and hence current)varies with pressure.Preliminary testing of our sensor using stationary weights showed that underideal conditions, we were able to achieve a touch weight range of 0.005–1kg using1kΩ resistors. Under the most severe conditions, lighter touches were lost in thedense fur; at the heavier end, touches were equalized by the yielding foam sub-strate. For Study 1, the curved-foam substrate with thick fur cover was the mostobscuring condition; for Study 2, this was the cover condition with bot in motion.Dynamic range is modulated through choice of resistor value. We found thatvalues greater than 1kΩ allowed our sensor to register greater forces, but lost res-olution; conversely, lower values gave greater granularity in recognizing very finetouches, but were too vulnerable to saturation at commonly applied force levels.The same sensor and microprocessor set were used in all studies described here.2.3.2 MethodsOur two studies assessed how realistic conditions impacted sensor data and hencerecognition accuracy; gestures and data collection procedures were unchanged.Gestures and SamplingWe selected gestures from Yohanan et al’s touch dictionary [110], choosing itemsmost appropriate for human-animal interactions [33]. The sensor was placed on atable in front of a seated participant, a reference sheet with very general definitionsfor six selected gestures and one control was provided (Table 2.1). Participantswere instructed to interpret each gesture as they saw fit; no further performanceclarifications were provided.A frame consisted of pressure data from all 100 taxels in the 10×10 grid. Wecollected 10 seconds of continuous hand touch data at 54 frames per second foreach combination of gesture and condition, randomizing gestures and conditionswherever possible.21Table 2.1: Touch gesture instructions as provided to participants.Gesture Suggested Definitionno touch no contact with the sensor (control)constant touch contact without movementpat quick & gentle touches with the flat ofthe handrub moving the hand to and fro with firmpressurescratch rubbing with the fingertipsstroke moving hand repeatedlytickle touching with light finger movementsStudy 1: Cover and Substrate on Static RobotWe first measured gesture recognition for the static (unmoving) case, to assessimpact of the sensor’s substrate stiffness, curvature and covering thickness in ab-sence of movement noise. This produced a factorial design of 4× 3× 7 (cover×substrate×gesture), using gestures listed in Table 2.1.Cover: The fabric’s pile or density varied from no cover (participant touchedsensor directly) to a very long, thick synthetic fur. Minky (a short furry fabric gen-erally used for baby blankets), and a longer-furred fabric comprised intermediatevariations.Substrate: The material underneath the sensor consisted of a firm, flat surface(sensor affixed by velcro to a table); a spongy foam, flat surface; and a spongyfoam, curved surface. In cases with foam, the sensor was pinned directly to thefoam substrate.To minimize sensor reading disturbances due to transitions (i.e., unwrappingand replacing the sensor on/off the robot body), we blocked our design on thecover× substrate conditions. Condition order was randomly generated for everyparticipant, and gesture order was further randomized over each condition set. Allparticipants completed all twelve masking conditions, with each generating 48 2ssample windows per gesture. A study session took approximately 50 minutes tocomplete. 10 volunteers (4 female, 6 male) were compensated $10 for their time.22Study 2: Stationary vs Moving RobotOur second study focused on the impact of the robot’s breathing movement. Wevaried cover×motion×gesture, for a 2×2×7 factorial design. Factors consistedof cover = {cover, no cover}, motion = {breathing, not breathing}, and gesture ={set of seven gestures}. Each participant performed each condition combinationtwice in a randomly generated order.In the breathing condition, the sensor was attached to the CuddleBot, a cat-sized robot designed for therapeutic use. Fig. 2.1(a-b) shows the naked skeletonand the sensor pinned to the foam intermediary. The robot’s ‘breathing’ motionwas created by extending and contracting the paired rib assemblies in a 14◦ arcfrom the spine at 0.5Hz (Fig. 2.1(c)). We draped and pinned fabric over the sensor,approximating a full fur jacket for condition randomization while limiting sensordisruption (Fig. 2.1(d)),Each session began by asking the participant to interact freely with the covered,moving robot for 1 minute to reduce novelty. Each condition was then presentedtwice, in random order, for a total of ((2×2×7) + 1) = 57 trials. 16 participants (10female, 6 male) were compensated $5 for the 30 minute session, each providing 322s samples of each gesture for every condition.2.3.3 Analysis and ResultsWe discarded the first and last second of each 10s gesture capture and divided theremaining 8s into four 2s windows. The 2s window (at 54Hz) was chosen to alloweach gesture some periodicity; all gestures fit completely within 1s (Flagg [33]).Given the challenge of determining gesture boundaries in a realistic, real-time set-ting when a motion is steadily repeated, a 2s window allows capture of at least 1complete gesture cycle.To account for translatory gestures, we also calculated a centroid (average ge-ometric centre) weighted by the pressure reading for each frame. Centroids weredefined by row Cx (Eq. 2.1) and column position Cy (Eq. 2.1 with i and j indices23Figure 2.2: Mean gesture prediction accuracy rates with added pressure noisewhen (a) varying substrate or cover in Study 1 and (b) varying motionand cover on the same curved structure as in Study 2. Each bar repre-sents an average accuracy rate over 10 trials; error bars are omitted as ∆across trials < 0.001% in each case.reversed):Cx =∑10i=1∑10j=1 i∗ pressure(i, j)∑10i=1∑10j=1 pressure(i, j)(2.1)We calculated weighted pressure by summing readings across each row, multiply-ing by index, and dividing by the unweighted frame sum (the sum of the full framesensor reading). Repeated for each column, this provided a tuple of frame sum andcentroid per frame.As a “baseline” for both studies, we sampled sensor frames in the absence ofgestures. In Study 1, each of the 12 (4 cover× 3 substrate) condition sets con-tributed 4320 frames; in Study 2, each of the 4 (2 motion×2 cover) condition setscontributed 6912 frames. To establish the effect of noise under each condition, weran MANOVA over three frame-level dependent variables: pressure, Cx-coordinate,and Cy-coordinate. In all cases except one5, all three variables showed significantdifferences at the p < 0.001 level. This indicates that the sensor is sensitive to24changes in these conditions.The data fails the Shapiro-Wilks test of normality; however, visual inspectionsof residual Q-Q plots did not reveal any systematic patterns. Together with ourlarge sample size (n > 4000 frames per condition), we proceeded with the normal-ity assumption, alert to risk of inflated Type I error.The six gestures (omitting no-touch data) were then compared with each otherunder the conditions of each study. MANOVA over the same three metrics (pres-sure, Cx, Cy) showed that gesture and participant combinations were statisticallysignificant (p < 0.001). Differences in participant touch were detectable at framelevel.We calculated seven features across these three dimensions (frame value, Cx,Cy) for each 2s window for a total of 21 features. For each dimension, features are{maximum, minimum, mean, median, variance across all frames, total variancewithin the 2s window, area under the curve}. Condition variables (curvature, f ur)or (cover, motion) make up the other features. Participant labels were includedfor gesture predictions and vice versa.Each capture produced four 2s windows, providing repetition for training. Pair-wise comparisons of all within- and between- capture windows generated two bino-mial distributions for statistically significant pairs using two-sample Kolmogorov-Smirnov (KS) tests. Permutation testing [37] using the KS test statistic did notdetect a statistically significant difference (p = 0.214) between the distributions.This is consistent with our observations of participants varying touch behaviourboth between and within captures.We used Weka, an open-source machine learning application to classify ges-tures [41]. Flagg’s comparison of random forest and a number of other algorithmsshowed that random forest performed best in gesture recognition of this kind [33].We ran k-fold Cross Validation (CV) on Study 1 participant data for k = {5, 10, 20,100} and found less than 1% improvement between 20- and 100- folds. While thisCV technique does ensure that any one instance is included in the test or trainingset and not both, it cannot promise subject-independent classification. RunningLeave One Out (LOO) classification yielded slightly improved results but we were5For Study 2, the condition of with-cover×with-motion under no touch did not show statisticallysignificant differences in Cx data.25Figure 2.3: A modified Hinton confusion matrix for gesture classification.Horizontal (row) gestures are classified as the vertical (column) gesture.Saturation in non-diagonal squares represents number of misclassifica-tions.cautious to the inflated bias [58]. All reported classification performance is there-fore based on the slightly conservative 20-fold cross validation of random forestmodels. Accuracy is defined as the percentage of data instances that are correctlyclassified.Gesture Classification by ConditionH1: Gesture recognition rates will decrease with increase in noise-creatingfactors—accepted.Comparing classification under Study 1 conditions (static surface), we found high-est recognition accuracy with no cover on the firm, flat substrate case. Lowest per-formers were dense fur and curved, foam substrate. In Study 2 (dynamic surface,heavy versus no cover), conditioning across each of surface and motion factors hadminor effect recognition rates (all ≈88%).With models trained on individual, Study 1 showed little change in gesture26prediction rate compared to all-data models. Study 2 individually-trained resultsare more similar to other studies, which also report training on single-conditiondata [33, 52, 68, 98].Cover-substrate-motion: Fig. 2.2 shows overall gesture recognition accuracyby study and condition set.We assessed relative noise levels by calculating effect sizes of significant con-ditions. Cohen’s d reveals a large effect (|d| ≥ 0.8 [23]) with the introduction ofcurvature (vs no substrate) and fur and short minky (vs no cover) in Study 1. Largeeffects (|d| ≥ 0.8 [23]) from Study 2 were from introducing the cover (regardlessof motion), and from the combination of having motion and cover. Interestinglyenough, adding motion by itself produced a very low effect (d ≤ 0.08). Furtherinvestigation into the interaction between cover and motion on pressure readingsincluded Tukey’s HSD of adjusted p-values to clarify the significance of stratifiedfactors. While all other combinations remained significant at p < 0.05, the case ofvarying motion in the presence of a cover was alone insignificant at p.ad j = 0.7.A confusion matrix (Fig. 2.3) indicates how gestures were misclassified. Inboth studies, the most-misclassified was tickle.Participant: We classified gestures with models trained by participant. InStudy 1, mean accuracy was 89.1% (max=94.6%)6. Models trained on all Study 1data were accurate at 90.0%, i.e. within 1% of the mean accuracy of the individual-trained models. This indicated that training on participants did not improve recog-nition when data was not conditioned on noise-creating factors.For Study 2 (fewer noise factors) we found a greater effect for models trainedon participants (mean=86.5%, max=97.3%)7. Training across all data gave 82.1%accuracy.The motion× cover condition had an overall 79.4% recognition rate. Train-ing on the subset of data with the most challenging conditions (in-motion, with-cover) still produced a higher recognition rate when using individual-trained mod-els (mean=85.7%, min=73.7%, max=95.1%).6Study 1 gesture recognition accuracy by participant: P1-93.0%, P2-83.8%, P3-85.0%, P4-92.6%, P5-93.2%, P6-88.0%, P7-94.6%, P8-91.7%, P9-86.0%, P10-83.4%7Study 2 gesture recognition accuracy by participant: P1-90.2%, P2-86.6%, P3-86.6%, P4-91.1%, P5-81.3%, P6-86.1%, P7-84.8%, P8-79.5%, P9-95.5%, P10-79.5%, P11-90.2%, P12-93.8%,P13-97.3%, P14-83.0%, P15-79.5%, P16-79.5%27Figure 2.4: Cohen’s d effect sizes of participant by gesture for each study.We compared mean pressure of gesture behaviours by individual versus thatof the entire pool (i.e. how P1 performed scratch versus how all participants per-formed scratch). All incidences were significant at p< 0.05 (Cohen’s d effect sizesreported in Fig. 2.4).28Toucher RecognitionH2: Variability in gesture execution will be higher between subjects thanwithin subjects—partially accepted, for the case of data compared within thesame noise conditions.The ability to recognize toucher may have great impact on reading emotional state.We compared performance in participant classification for models trained acrossthe entire dataset, with those trained on the 6 meaningful gestures of our gestureset (omitting no touch). We also look at accuracy rates on data collected in themost realistic condition (in-motion, with-cover).Recognition rate by study: We compare recognition rate by study and gesturein Fig. 2.5. Study 1 achieves an overall accuracy rate of 78.5% (chance 10%), butfor models trained by gesture, a mean of 87.9%. The highest contributing gestureis constant at 92.7%, followed by pat at 88.9%.Using all Study 2 data, participant recognition was 80.3%. Training by gestureagain showed recognition improvement; constant was best at 93.8%, followed bypat at 89.8% (mean, all 6 gestures: 85.4%).Conditioning on only the in-motion, with cover factor, referred to in Fig. 2.5 asStudy 2b, average recognition rates of participants are 89.8%. Further splitting datato additionally train models by gesture does not provide additional improvement inmean performance (85.2%); this time pat is the highest performer (93.8%) andconstant a close second (90.6%).We again refer to effect sizes (Fig. 2.4) to consider the role of pressure in partic-ipant recognition; individuals making different gestures exhibit considerable vari-ation in pressure patterns.2.4 DiscussionWe discuss our findings in direct response to questions posed in Section 2.1.1.Q1a: Potential accuracy of sensor in gesture recognitionUnsurprisingly, we found the highest recognition rate (94.8%) for the case of nocovering and a flat, stiff, stationary surface (Study 1); these are the least demandingconditions and the ones we expected to perform the best.29Figure 2.5: Mean subject recognition rates by gesture and study over 10 tri-als; error bars omitted as ∆ across trials < 0.01% in each case. Study2b refers to the ‘in-motion, with cover’ condition.In evaluating the degree to which noise factors degraded performance, we ex-pected the noisiest conditions to be in Study 2: moving, curved, springy surfaceunder a heavy fur cover. This achieved 88.6% recognition rate of our 6 gestures and‘no touch’, among the lowest we observed. However, at just under 90%, this valueis still usably high. Further work is required to assess the impact of nonuniformmotion, as well as unknown gesture segmentation boundaries in lesser controlledconditions.Q1b: Potential accuracy of sensor in user differentiationOur studies show that the ability to pick a particular ‘toucher’ out of a knowngroup varies by gesture. A priori knowledge of a condition also improves predic-tion accuracy, jumping from 80.3% trained over all data to 89.8% when trained onin-motion, with-cover, the noisiest condition. To see how this may change overthe various gestures, we refer to Fig. 2.5 which ranks constant and pat as mostidentifiable. Fig. 2.4, which compares the effect size of pressure reading by par-ticipant and gesture, reveals that there are many large effects for constant gesture.This focus on pressure suggests that there may be revealing variations in individual‘heaviness of hand’.Q2a: Impact on accuracy due to cover, substrate and motionOver our two studies, we examined variations in cover thickness, substrate stiffness30and curvature, and motion. Summarized in Fig. 2.2, we now discuss the impact ofthese factors individually.Cover: The effect of a cover on classification performance is significant; moreso than the underlying motion (as noted by Section 2.3.3). Fig. 2.2 further illus-trates this. Regardless of whether we partition our data by cover on/off or motionpresent/absent, we achieve gesture recognition of at least 88.1%, 6% higher thantraining overall (82.1%).The pressure applied over a denser, heavier fur cover may muffle some of thelighter touches and degrade transmission of touch pressure and/or location, thusconfusing some gestures.Another possible explanation could be from added familiarity that the coveraffords. For example, according to one subject, “When it had the fur on, I hada more pleasant experience...Without the fur, I found it difficult to touch it.” (S7)This opinion was expressed in some form by 10 of 16 Study 2 participants. Moreresearch is needed to determine if the fur invited more naturalistic touching.Substrate: Compared to a flat, hard surface, a flat foam substrate decreasedrecognition accuracy by about 1% (Fig. 2.2a). It had slightly less impact thancurvature or, comparing to Study 2, than motion. Given the sensor’s piezoresistiveconstruction, we anticipated the effect of firmly compliant backing to be small; thisfinding confirms that a somewhat springy underlying surface (helpful for conveyingthe sense of an animal body as well as a pleasant tactility) is feasible under a large-body touch sensor.Motion: The relatively small effect size of motion in raw frame data is unex-pected. However, in the context of Tukey’s HSD results (with a cover, the motioneffect is insignificant), we gain some further insight into just how small the effectof regular periodic motion is, and we confidently rank motion noise behind that ofa cover.This is very promising for the larger premise of reliable touch sensing on aflexing surface.Interaction of motion and cover: There is a large effect size for the interactionbetween cover and motion, which is absent in recognition performance conditionedon added noise factors (Fig. 2.2). This consistent improvement over training on alldata (overall at 82.1%) suggests that these large effect sizes of noise interference31Figure 2.6: Top features as selected by Weka for each study. Classificationtasks are Gesture and Subject, by Location (Cx,Cy) and Pressure fea-tures. Features selected at under 25% frequency in 20-fold cross valida-tion are omitted.have little effect on recognition as long as we train and test on the same condition.Q2b: Gesture Recognizability:Gesture confusion patterns reveal a considerable range of misclassification (themore saturated cells in Fig. 2.3). In Study 1, the most commonly misclassifiedgesture is rub as tickle; in Study 2 scratch is most misclassified as rub. Both pairsare commonly executed as quick back-and-forth motions. This may be related torelative gesture pressure by individual: gestures like constant, generally more sta-tionary, are predicted consistently and also indicate a larger effect size by pressure(Fig. 2.4). Quick motions being lost in the heavier covering may also contribute tothese errors.Q3a: Feature UtilityMaking computational economy is the key to real-time recognition. Prioritizedfeature selection allows us to focus on high-performing dimensions. To help un-derstand relative feature utility in our recognition tasks, we used Weka’s AttributeEvaluator function to find the highest-weighted features for the random forestmodel (Fig. 2.6).The feature set with the greatest ability to differentiate gestures related to pres-32sure variance; meanwhile, location variance facilitated toucher recognition. Peo-ple’s touch signatures may vary more in physical location range, but, a gesture maybe better characterized using pressure when toucher is known (Fig. 2.4).These results suggest that using a subset of the features described here couldincrease computational efficiency, depending on the priority of recognition taskneeded and the variance exhibited by an actual data pool. Meanwhile, evaluatingthe performance of a reduced feature set is difficult due to the lack of a benchmarkfor comparing accuracy rates [51].Q3b: Computational viability of real-time gesture recognitionThe conditions evaluated here approached realism in some respects, specificallythat of sensor covering, substrate, and underlying motion. Our post-hoc analysisindicated that a modern microprocessor could keep up with both sampling andrecognition.Our setup fell short of realism in at least one important factor: people areunlikely to perform distinct, discrete gestures with well-defined boundaries. A dif-ferent computational architecture will be required to handle this problem (a topicof ongoing work). However at present, computational load is dominated by sam-pling rather than recognition, an overhead cost that will not necessarily change withreal-time use (unless more selective sampling can be employed based on observedpatterns of touching). It is thus quite likely that a more capable recognition enginewill also be feasible with comparable computational resources. In situ real-timerecognition may be better approximated by speaker-independent Leave-One-Out(LOO) sliding window. Our work uses k-fold CV as the more conservative accu-racy rate (as compared to LOO), as we do expect a calibration process in whichspeaker behaviour is learned. Until we optimize, the best window size is unknown.2.5 ConclusionsThe results described here represent an initial feasibility assessment of the impactof flexing surfaces on gesture recognition performance. We found recognition ratesfrom 80–95% for optimal to noisy conditions when distinguishing between socialtouch gestures relevant to interacting with a small touch-centric robotic entity. Wefurther found an ability to distinguish individual toucher at a rate of 78.5% and3380.3% in Study 1 and Study 2 respectively. In the noisiest case (also the most re-alistic), training by condition increased participant recognition accuracy to 89.8%.The next step is evaluating more comprehensive sets of movement conditions.The implication of a sensing system able to detect both individuality touch andtoucher is considerable. For example, a sensor able to differentiate between userscould provide a personalized set of experiences or controls.Further, identifying the touch brings us closer to differentiating affective in-tent [50]; identifying toucher may allow us to qualify their touch behaviour. Asensor loaded with a personal touch profile could determine how far an individualdeviates from that profile on a given day, and infer emotional status. To build sucha profile, it will be important to establish the dimensions of a touch signature.2.6 Future WorkWe foresee many ways in which to extend this work.More extensive movement conditions: The present study employed steady peri-odic motion of an underlying surface for a flexible sensor. A more general, andpotentially challenging, environment will include irregular and unexpected mo-tions.Continuous gestures: The single-gesture samples of this study removed the needto segment data in pre-processing. In future, an algorithm will not know of ges-ture boundaries or length a priori, and will need to handle the case of seamlesslytransitioning gestures.One approach is to run several sampling windows of different length to searchfor varying touch activations at the cost of increased computational load. Futurework needs to explore this and other architecture to determine a strategy to optimizefor computational efficiency.Pragmatic gestures: In this study, participants were instructed to perform a partic-ular named touch gesture, but not with communicative intent or emotion context.The semantics of a “natural” touch will be dependent on context of situation andthe user’s own state; to determine communicative intent, it may be necessary toobserve other factors as well.34Our participants often varied in how they interpreted a given gesture, both be-tween participants, and individually between and within conditions. For the latter,we suspect users may have performed more authentic gestures on the moving, fur-covered robot than when it was flat, stationary and/or uncovered. We also observeddiffering touch behaviour from the beginning through the end of one capture, butour sensing mechanisms are unable to distinguish these cases.Gesture stabilization and system interactivity: Finally, with more efficient algo-rithms deployable in realistic conditions, we plan a longitudinal study of long-terminteractions in natural settings to investigate how individual gestures change overtime as a toucher learns to interact with the sensing system.2.7 AcknowledgmentsWe would like to thank the Natural Sciences and Engineering Research Council ofCanada (NSERC), SurfNet and the UBC SURE program for providing partial sup-port of this work. We also thank Michael Phan-Ba for engineering the firmware androbot control, Philippe Kruchten for his mentorship, and finally, Oliver Schneiderfor the excellent title.35Chapter 3Affect DetectionAutomatic affect detection allow machines to extend beyond communication viainstructional gestures to access more contextual cues. Emotions are expressedthrough many physical channels, including (but not limited to) physiology [62],facial expressions [60], eye behaviour [48, 74], speech [73], vocal prosody [59],and even touch [43]. Since machine recognition of emotional expression throughtouch is still largely unsolved and as yet unreliable [4], we use multimodal sensingto triangulate affective signals by leveraging gaze and biometric support of touch.In Chapter 2, we established that the custom-build sensor performed at literature-levels for gesture classification (80%–95%), so we proceed with the same sensorhere.In real-use case, a therapy robot pet would need to recognize true expressedemotional touch in the absence of gestural or emotional direction, contrary to pre-vious classification tasks [4, 43] where participants were asked to convey emotionas a proxy to personal experience. Therein lay our first challenge: collecting thiskind of truly experienced emotion in lab as training data. We turned to an in-teresting emotion recall technique. Relived emotional memories can elicit strongbiometric responses [32] reminiscent of the original emotional experience. Thuswe designed our study around recall of emotionally intense memories and capturedthis affect-imbued data using our custom touch sensor, supported by commerciallyavailable gaze and biometric sensors as well as self-reports.36AbstractEfficient, unobtrusive machine recognition of human affect will be a key compo-nent in interactive systems that must respond to human emotional state - e.g., , robottherapy, assisted-driving systems and emotion-aware game development. Becauseaffective communication occurs through many modalities, algorithmic recognitionof affect requires the flexibility to sense and integrate information from multiplesources. Considering the application of therapeutic interaction with a robot pet,we look here at alternative nonverbal modalities known to reflect affect: mainlytouch (measuring force magnitude and location), supported by gaze (location)and biometric indicators (skin conductance, blood volume pulse, respiration). Wecollected a training data series (N=30) from all three modalities and looked foremotion-reflecting features and linkages between them.For training data that reflected true experienced emotion, we asked participantsto relive intense emotional memories while touching a stationary, furry robot, elic-iting authentically experienced emotions under controlled lab conditions. Touchdata was collected using a sensor embedded under the robot’s fur; gaze data wasrecorded using an eye tracker next to the robot; and biometric data tracked us-ing sensors attached to the participant’s body. Each participant interacted with therobot while experiencing two opposing emotions: Stressed-Relaxed or Depressed-Excited.Targeting improved emotion classification from integrating touch with gazeand biometrics, we extended past touch classifiers to include features from thefrequency domain. We report accuracy results from a random forest classifierbuilt on 100 trees with 20-fold cross validation and leave-one-out using Weka, anopen-source machine learning program. Partitioning our dataset by level of systemknowledge of participant information [labelled, unlabelled, leave-out] and acrosstime-varying windows [2s, 1s, 0.5s, 0.2s], we found classification accuracy ratesimprove with increasing system knowledge of participant. Labelled participantdata on adjacent windows achieved accuracy rates as high as 92% on touch dataalone and unlabelled, untrained participant data as low as 27% (chance 25%).The wide range in classification accuracy suggests possibilities for in situ im-plementation for a known user base. Adding modalities of gaze and biometric dataimprove accuracy only when a participant’s instances are present in both training37and test sets. To address the computational efficiency required of dynamic andadaptive real-time interactions, we analyze relative subsets of our multimodal fea-ture set, and provide design recommendations for our therapeutic robot.3.1 IntroductionSocial interfaces such as robots, smart cars or game systems must facilitate com-plex and believable interactions with human users such that the machines appearto respond or adapt to human social cues [34]. Because people prefer to inter-act with machines as they do with other people [34], it is important for systemsto understand human social cues that can carry emotional significance, includingnonverbal channels such as facial expression, body pose, social touch, eye focusand vocal prosody. Humans have evolved or learned naturally during early socialdevelopment to use such cues to distinguish between emotions such as distress andhappiness [1]. Machines must be explicitly trained to do this in different social con-texts where the affect-expressing channels may vary [1]. Examples of applicationswith different social contexts are given below:Entertainment: Emotion manipulation plays an important role in how we expe-rience computer gaming media, but player responses are individual and vary overtime. A system able to, for example, detect excitement and startled could enhancethe experience for players who enjoy being challenged or scared, through auto-matic, personalized changes in difficulty. Technology for detecting emotional statecould be embedded in a gaming controller and TV screen.Assisted driving: In assisted driving it is currently difficult to determine when thesystem can take over driving safely. Assessing a user’s momentary attention andstress could be a key to this problem. If the car could sense increases in arousal, itmight control the environment to calm the user down if needed. Sensors could bebuilt into the steering wheel, mirrors, and seat to assess emotional state and focus.Social robot therapy: Affective therapies for treatment of anxiety require systemsthan can sense human social cues. A number of touchable robots such as the babyharp seal Paro [102] and teddy-bear-like Huggable [95] have been developed forthis purpose. Studies with the Haptic Creature [110] specifically investigated hu-man affective touch of and affective display by a zoomorphic robot animal that had38an array of touch sensors embedded in it.In this paper, we focus on enabling emotion recognition in the last applicationexample – social touch robots for therapy purposes. Earlier work indicates thatinteraction with such robots can affect human emotional state. [88] showed thatmotion of the Haptic Creature lowered anxiety in users who were stroking it ontheir laps. However, Sefidgar and MacLean [88] did not set out to investigateautomatic human affect recognition with the robot. This could be a key to providingbenefits comparable to the physiological benefits demonstrated by animal-assistedtherapy [8, 10, 71, 81] – especially valuable where human patients are unable toengage with actual therapy animals.Using touch as the primary interaction modality leverages the natural inclina-tions for physical contact to represent emotional closeness while minimizing inva-sive sensing. Previous work has shown that affect-related information of human-animal robot interaction can be extracted from touch gestures such as stroking andrubbing [4]. However, there is doubt as to whether classifying gestures is helpfulin detecting true emotional state – intuitively, knowing whether a gesture was a‘stroke’ or a ‘rub’ may not supply deterministic information about the emotionalstate of the user while performing that gesture. Furthermore, these studies col-lected intent style data, where the emotions were expressed to a sensor, and notexperienced by a participant. A therapy robot should recognize a user’s emotion asit unfolds, thus the model must be built on participants who are truly experiencingthe emotions being studied.Because recognition of human affect from touch data alone is a challengingtask, we included two alternative supporting modalities that could potentially im-prove recognition performance: biometrics and gaze. Our choice of these twomodalities over other alternatives such as vocal prosody and facial expressions wasmotivated through suitability to our robot pet application. Earlier work has shownthat touch interaction with a robot pet can decrease heart and respiration rates [88].This suggests that sensing and utilizing such biometric responses during interactioncould make recognition of human affective state more accurate. Additionally, gazeis a good indicator of visual attention determining whether a user is focusing onthe robot pet and has been used successfully in predicting affect. Jaques et al [48]has demonstrated that user’s gaze focal points on a computer display are related to39feelings of curiosity and boredom.3.1.1 Approach and Research QuestionsWe investigate how gaze and biometrics could support recognition of affect throughtouch interaction with a robot pet. With touch as the central modality, we alsochose analysis methods that originate from social touch gesture classification [33,51] and borrow features calculated from force magnitude and location (pressure-location domain) for finding a baseline for emotion classification in touch as wellas pressure characteristics in the frequency domain. By adding established signalprocessing from biometric and gaze classification methods, we hope to reveal howthe emotion classification is affected by inclusion of gaze and biometrics.Finding an optimal machine learning feature subset becomes combinatoriallyintractable with increasing modalities and accompanying statistical features1.However, this is an important endeavour for the eventual end-goal of the creationof a real-time, automatic emotion modelling system. So we must consider datacollection window size, sampling density, and feature set to reduce computationalload and classification error2. Towards this goal, we investigate three avenues: (1)various collection windows and (2) sample adjacency by accuracy, and (3) featuresubset performance by selection popularity. Each of these variables are explainedin more detail with their respective research questions.RQ1 – Touch with Multimodal Support: How accurately can we classify emo-tional state based on combinations of gaze and physiological data with touchdata?Multimodal datasets likely provide a more complete picture than touch alone,due to asynchronous activation, or interaction information.We expect classification accuracy to improve with increased modality1Note that our current method of feature creation grows roughly in O(|F ||M|), where F is the setof statistical features and M is the set of modalities.2For a random forest classifier, having extra features that contribute little information can de-crease accuracy rates if they are randomly selected. We have also found that certain features opposeeach other, i.e. including both in a subset decreases accuracy, when each has high information gainindividually.40support. Gaze and biometric data, both known to encode affective con-tent [48, 57], help to round out emotional signals from touch.RQ2 – System Knowledge of Individual: How important is system calibrationof user to affect classification?Past social touch gesture recognition results suggest that individuals havedistinctive ways of interacting with touch sensors that make recognizingidentity surprisingly accurate [19, 33]. This suggests that a system that haslearned user behaviour may be better at gesture recognition. Leveraging thisresult for affect, we perform classification across three different levels of sys-tem knowledge of participant (hereby referred to as participant knowledge)and discuss results.Recognition rates is likely to increase alongside greater participant knowl-edge: Participant-labelled data where instances from the same individual arein both training and test sets will yield the highest classification accuracy;alternatively, lowest predictive accuracy occurs where testing and trainingare performed on different individuals.RQ3 – Sample Density: Is classification robust to interruptions in signal or sam-ple size in continuous sampling?Outside of polling rate, we define sample density across two parameters: (1)window size and (2) window proximity. Window size represents a time inter-val of continuously sampled data; larger windows cover a longer time snap-shot and are thus more likely to capture distinguishing emotional character-istics over a behaviour. Window proximity refers to the increased likelihoodof neighbouring time series samples sharing more characteristics than distantsamples. We examine the influence of window size and window proximityby aggregating data instances in four different window sizes and comparingclassification accuracy of the same data set, with gap (dropping 2s of databetween windows so adjacent windows are not evaluated) and without gapdata (adjacent windows are included in the training and test sets).We posit that across both parameters, reducing sample density reduces41classification accuracy where the worst performance occurs with small win-dows with gapped data.RQ4 – Feature Analysis: Which features or sets of features provide the best chanceof high accuracy rates?Increased computation load and resultant latency are the cost of multimodally-enabled triangulation on parameter truth, and potentially undermine real-time feasibility of such a technique. In order to optimize these tradeoffs, weanalyse each of our features in terms of repeated occurrence in automatically-selected best feature subset. Finally, we assess which features, from a super-set containing both pressure-location domain and frequency domain, are tobe included in a strong feature set.Traditional touch and gaze features span the spatial and temporal domain,we add spectral distribution statistics to incorporate artifacts in interactionfrequency and posit that classification accuracy will be improved with theaddition of frequency domain features.3.1.2 ContributionsThrough answering our research questions, we contribute the following:• Affect classification performance from combinations of touch, gaze, physi-ology data in experienced-emotion interactions;• Feature analysis distinguishing relative recognition contributions from fea-ture subsets, to maximize multimodal benefits tradeoff with computationalload as well as a recommendation of data features from the frequency do-main and traditional pressure-location domain in an emotion classificationcontext;• Practical recommendations on the use of affect classification in three exam-ple scenarios with respect to system knowledge of the user.The remainder of this paper is structured to include a survey of previous work,motivating our choice of emotion elicitation method and including a history of42affect classification from each of touch, gaze, and biometrics. We describe ourexperiment in detail, describing our data post-processing procedures. Results arepresented from all data partitions that target the influence of multimodal data vstouch alone, participant knowledge, sample density, feature set, as well as a reviewof the emotional experience based on participant reports. Finally, we discuss ourfindings and ground this work in relevant application implications.3.2 Related WorkHere we review past methodologies and motivate the choice of relived experienceto target our emotion set. We also provide a history of affect classification bymodality and note cautions.3.2.1 Emotion SetWe take Russell’s circumplex model of affect to be our starting point, where arousal(activation) and valence (pleasantness) are orthogonal axes [83]. Although Rus-sell’s model is widely used by emotion researchers, there are some inherent limita-tions with discretizing and labeling the two-dimensional space as we must assumethat: (1) emotion labels will be interpreted consistently by every participant at anytime; and (2) the axes are truly orthogonal.Consider the emotional context of approaching the axes or origin when workingwith such a model:• it may not be sensible to reach (0,0), presumably a state of full neutrality.Similarly,• it may be absurd to talk about independent movement, i.e., , directly alongaxes: can one really have an increase in arousal without any change in va-lence?As such, many emotion researchers [27, 42, 105] opt to discretize the 2D spaceinto a grid and rotate it by 45, such that experimental materials and tasks are alignedwith the diagonal axes, namely (high arousal, high valence) ↔ (low arousal, lowvalence) and (high arousal, low valence)↔ (low arousal, high valence).43The literature provides little consensus for which emotion labels are to be em-ployed, making comparison between studies of even common modalities problem-atic. Understandably, papers utilizing information of gaze use attention-relatedemotion sets – e.g., , Anxiety, Boredom, Confusion, Curiosity, Excitement, Focus,Frustration [84]. Human recognition of human affect based on touch tries to spanthe human experience, namely Anger, Fear, Happiness, Sadness, Disgust, Surprise,Embarrassment, Envy, Pride [43]. Yet another method is to partition Russell’saffect grid as discrete labels: touch uses nine3, of which biometric data uses asubset4. We chose four emotions labels that minimally cover the Russell’s emo-tion model to avoid overlap in label interpretation: Stressed, Relaxed, Exited, andDepressed.3.2.2 Emotion ElicitationA fundamental problem with eliciting emotions from research participants [22] isconsistently producing valid emotions in a lab setting. Our motivating applicationinvolves a situated social robot that must react to authentic human emotions as theyoccur, which poses a challenge in contrived laboratory settings (“Feel angry as youinteract with our robot”). To circumvent these artificial emotion barriers, past workhas typically asked of participants the easier task of simulating intended emotions(“Imagine feeling anger, then express it to our robot”). For example, to collectthe data used in [110] and [4], participants were presented with a list of emotionsthat they had to express by touching a robot. Unfortunately, this does not directlyequate to experiencing an emotion, and may not accurately represent the intensityof an experienced emotion. Consider the difference between a smile and genuinelyfeeling happiness: the former lives in a public space, where the one smiling canconvey an emotional intent to others. The latter is private, the experienced emotionavailable only to the individual.Experienced emotion studies are difficult to conduct and are more rare. Play-ing entertainment media, specifically emotionally evocative music and/or video,has been employed as an emotion elicitation method [57]; however, selecting the3Emotions for classification by touch: Distressed, Aroused, Excited, Miserable, Neutral, Pleased,Depressed, Sleepy, Relaxed [4].4Emotions for classification by biometrics: Stressed, Excited, Depressed, Relaxed [57].44media to be applied as an emotion treatment introduces difficulties. Although val-idated sets of emotional media exist, they either rely heavily on getting the mood‘just right’ (music), or demand a high level of attention and divert gaze (video).Further, cultural and individual differences have great and unpredictable influenceover emotional reactions to either medium.To channel our scenario of interacting with a social touch robot, we askedparticipants to recall and/or retell an emotionally intense memory (as supportedby [32, 62]) while interacting with a furry, touch-friendly robot. Specifically, par-ticipants were asked to recollect a memory where they had strongly experienced aspecific emotion, and recall or tell the story of that experience to the robot whiletouching it. To verify that participants felt the intended emotional state, we used aself-report scale where they rated their current feeling before and after a task.3.2.3 ModalitiesAffect classification has been explored in each of the modalities explored here, allwith distinct emotion elicitation procedures and labels.TouchTouch data can be quickly dissected into force magnitude (or pressure) and location– dimensions which are used for gesture recognition as well as for control direc-tives (e.g., , using trackpads and touch screens). Social touch gestures have beenstudied and prediction accuracy has ranged from 59% [51] to 86% [33] depend-ing on collection and classification methods, which, like affect, have no consistentstandard. Still, the high prediction accuracy rates achieved on defined gesture setssuggest that these touches can be used as directives in systems with embeddedrecognition systems.Accurate recognition for true emotional data analysis seems to be more diffi-cult: even human recognition of human emotion in touch does not achieve gesturerecognition rates with highs of 59% (chance 8%) [43]. Machine classification hasdemonstrated 36∼48% accuracy (chance 11%) [4] depending on inclusion of par-ticipant knowledge. Both of these studies collected intent data, and not experiencedrelived emotion.45GazeAn interactant’s eyes give affect cues which are discernible with eye tracking tech-nology. Partala and Surakka [74] studied the effect of emotional auditory stimu-lation on pupil size variations; they found that negative and positive stimulationresulted in significantly larger pupil dilation than neutral stimulation but could notdifferentiate stimulus valence. Other factors, such as changes in luminance, canalso affect pupil dilation.An alternative is to analyze where a person is looking. Jaques et al [48] trackedstudents’ gaze when they interacted with a graphical intelligent tutoring system;gaze features such as fixations and saccades revealed that curious and bored stu-dents looked at different interface areas – for example, engaged students lookedmore at the table of contents. Overall, boredom and curiosity could be predictedwith 69% and 73% accuracy, respectively.To our knowledge, no studies have investigated whether gaze point is usefulin classifying emotions using the two-dimensional model of valence and arousal.Compared to pupil size variation measurements, gaze point can be measured in aless controlled environment (lighting and luminance changes impact data qualityless) with relatively inexpensive tracking technology. Thus, we utilize the Carte-sian coordinates of user gaze point in our own classification analyses.BiometricsBiometric signals such as blood volume pulse (BVP), skin conductivity (SC) andrespiratory rate (RR) have been widely used for multimodal emotion recognitionin a variety of contexts. Examples include facial expressions [60], affective au-dio [57, 69], gaze behaviours [45], and touch behaviours [88]. In addition, heartrate variability has been utilized in emotion classification [5, 6, 49, 88] and as abiophysical indicator of cardiological health [76].Our work follows in a tradition of earlier explorations into multimodal emo-tion recognition. The biometric signals that we chose were blood volume pulse(BVP), skin conductivity (SC), and respiratory rate (RR). Using these three basicsignals, we calculated a set of derived signals that consisted of heart rate variability(HRV) features, breathing rate variability (BRV) features, and cross-signal features46such as heart beats per breath. Studies where emotion elicitation is based in trueexperience and uses the same emotion sets (as in [57]) are most appropriate forcomparison. [57] uses validated music excerpts to generate authentic responsescrossing four musical emotions (positive/high arousal, negative/high arousal, neg-ative/low arousal, positive/low arousal), reporting affect recognition rates between70% and 95% (chance 25%) [57], with higher rates where participant knowledgeis included.3.3 MethodsWe asked participants to recall emotionally intense experiences while interactingwith our stationary robot pet. We sampled touch, gaze and biometrics sensors, withself-reports of emotion collected before and after each emotion. Of 30 participantsrecruited from across campus, 14 identified as female, 18 had corrected vision, andof mean age 25.4 years (σ=5.4 years). Participants were compensated $20 for theirtime.In the following subsections we give details of the experimental setup, intro-duce the procedure, and describe our data post-processing techniques.3.3.1 Experimental SetupConfiguration and RoomParticipants were positioned in a half-prone position on a couch to reduce large-scale movements while ensuring comfort (Figure 3.1). For valid data collection,our gaze and touch tracking systems needed to be within a close range; for validemotion elicitation, participants had to be comfortable enough during memory re-call to express the focal emotion. The experiment was conducted in a sparselyfurnished medium-sized office with a window, with the participant’s back to thedoor. The experimenter was present except for emotionally intense parts of thesession, as described below.47Figure 3.1: The experiment setup: participant sits comfortably supported bypillows, facing the gaze tracker with her hand on the touch-sensitivesurface of a stationary robot. Biometric sensors are worn around thewaist (respiratory rate), thumb (blood-volume pulse), and index and ringfingers (skin conductance) of the resting hand. One camera captures eyemovements and another is raised on a tripod behind the participant tocapture hand motions over the robot. Both cameras have audio disabledfor privacy. When the participant pulls the rope, a ball leading outsidethe room indicates to experimenters that the emotion task is complete.Touch Sensor on a Passive RobotWe used a custom flexible touch sensing apparatus previously described in [19] thathas been validated to detect 5g∼1kg of weight with resolution of 10×10 inches atone taxel per square inch5. We chose fingerpad-size taxel resolution similar tothat of earlier work [4, 33] since emotion tasks in touch generally incite broadermovements [43]. Higher resolution sensors have had great success in high preci-sion tasks such as those seen in touch screens, trackpads, or teleoperative mimicryand have useful applications in robotic arms [93]; however, they are massivelyoverqualified in terms of computation load and resolution for our purposes, where48low cost and high sensor malleability are crucial.Forming a 10-by-10 grid, this device can sense multiple simultaneous touches(so-called multitouch), registering varying pressures on each taxel scaled to 1024levels and polling at 54Hz. This resulted in 54 frames of 100 cells per second, eachreading a touch pressure value between 0 and 1023.The touch sensor was installed on a stationary and unresponsive furry object,roughly the size and weight of a football. The sensor was affixed by velcro to a sub-strate made of soft-shelled binder plastic6, then covered with a uniformly-texturedshort, soft minky fabric7 described as “pleasant to touch...[and] reminded me of mychocolate lab’s head” – P4. Participants were instructed to touch the top surface ofthe furry robot (Figure 3.1). All sensors were wired through the robot platform tominimize visual clutter and connected to a single laptop. Figure 3.2 demonstratesrobot construction. While the robot is capable of motion, we disabled actuationfor this study to reduce confounds from novelty effects, sounds, or expectation ofa reactive robot.Gaze and Biometric SensorsWe sampled gaze behaviour via Tobii EyeX gaze tracker8, at a rate of 60Hz –similar in rate to our touch collection. The Tobii EyeX tracker was chosen becauseit has been used successfully in a number of studies to track users gaze locationwhen looking a computer display or tablet [55, 61, 70]. It is also small in sizeso we could place it below the robot at an angle where it could see participant’seyes while touching the robot (Figure 3.1). Although no specific instructions weregiven regarding a required gaze direction, participants were informed that gaze datacollection was best when they were facing forward and did not make large bodymovements.We collected three biometric signals using the Bio-Graph Infiniti PhysiologySuite9, namely Blood Volume Pulse (BVP), Skin Conductance (SC), and Respira-5Built from commercially available piezoresistive and conductive fabric. Fabric is commerciallyavailable at www.eeonyx.com.6Specifically, Wilson-Jones Accohide flexible cover binder http://amzn.to/29AiYbx.7Minky is a soft, short-pile fabric, often used to make baby blankets. Examples can be foundonline including http://amzn.to/29L2DBv.8Tobii EyeX gaze trackers are available at http://www.tobii.com/xperience/products/.49Figure 3.2: Our robot was constructed from pliant plastic sheets actuated bya pulley, covered with a custom-built sensor, and finally wrapped in afurry fabric to invite touch. It remained stationary during the course ofthis study to eliminate any effect of participant reaction to robot motion.tory Rate (RR), all collected at 2048Hz. Following established procedures for de-rived signals, these were expanded to include a set of Heart Rate Variability (HRV)features, Breathing Rate Variability (BRV), and cross-signal features such as heartbeats per breath. The cross-signal features were automatically calculated as part ofthe Thought Technology Physiology Suite.Participants were first outfitted with a respiration band worn around the chest9System manufactured by Thought Technology Ltd. FlexComp ∞ SA7550 Hardware Manual canbe found through manufacturer website at http://bit.ly/29A5NIC.50– a close fit that did not impede breathing. Once they were comfortably seated,we positioned the BVP sensor at the thumbpad. Finally, the SC sensors were posi-tioned on the index and ring finger pads. Both the BVP and SC sensors were heldin place by a small velcro band on the right hand which was not used for touchingthe robot.Video Data CollectionWe took video recordings of the participant’s hands and face to help in data analysisin case of missing gaze or touch data. No sound was recorded out of respect forprivacy. Camera for recording the hands was placed behind the participant andanother camera was placed on the right side of the participant for recording theface (see Figure 3.1 for placement of the gaze camera).3.3.2 ProcedureTable 3.1 summarizes the study procedure. The main part of the experiment wasrun in alternating stages: (1) a neutral task, (2) the first emotional task, (3) a neutraltask, (4) the second emotional task. Emotion tasks were counterbalanced acrossparticipants.Introduction and CalibrationThe participant was welcomed, consent process administered, and sensing equip-ment set up. To reduce novelty effects, we introduced the robot, invited touchexploration, and described the construction of the robot including sensing abilities.We explicitly indicated that the robot would not move throughout the study. Datareading range was then checked for each sensing modality in a calibration phase.Neutralization and Self-reportFor each stage, we first presented an emotionally neutralizing reading task wherethe participant read a short technical report from a technology magazine. Theparticipant then reported their current emotional state by marking a sheet of paperdisplaying Russell’s [83] 2-dimensional scale of arousal and valence [22]. Duringthis period, the participant was instructed to keep their hands still and not touch51Table 3.1: Experimental Procedure SummaryProcedure Description (imposedduration)DataRecordedIntroduction Describe experimentprocedurenoneApply sensors noneVerified stable BiometricreadingsnoneNeutralization 1 Read neutral text (5 min) BiometricsSelf-report EmotionalStateCalibrate gaze trackerand touch sensorCalibrationLogsEmotion 1 Recall Memory Biometrics,Gaze,TouchSelf-report EmotionalStateNeutralization 2 Read neutral text (5 min) BiometricsSelf-report EmotionalStateCalibrate gaze trackerand touch sensorCalibrationLogsEmotion 2 Recall Memory Biometrics,Gaze,TouchSelf-report EmotionalStateGesture Calibrate gaze trackerand touch sensorCalibrationLogs8 randomized gesturetasks (10s each)Gaze,TouchDebrief andInterviewInterview QualitativedataSelf-report EmotionalState52the robot. A definition of arousal and valence were provided, and an experimenteranswered any and all participant questions about the nature of the scale. Thisself-report was repeated before and after each neutralizing and emotion task. Foremotion tasks, a genuineness rating of their experienced emotion was included.Reliving Emotion TaskThe participant was next instructed to recall an emotionally intense memory per-taining to a given emotion word {Stressed, Excited, Relaxed, or Depressed} and,while comfortably seated, interact with the furry sensing robot. They were in-structed to relive the emotion as intensely as possible and describe the memoryand associated feelings to the robot in any language they were most comfortablein at a volume of their choosing. We reminded them that all audio recording invideo cameras was disabled and that we could not hear them speak from outsidethe experiment room. Task completion was indicated by pulling a rope that led towhere experimenters were waiting. Data was collected over the course of a sin-gle recalled memory (duration µ=4.23 min, σ=3.09 min). Once the participantpulled on a rope to indicate that he or she had completed the emotion task, theexperimenter returned and administered the self-report grid.The participant then proceeded to the second set of neutralization and relivingemotion tasks. The emotion of the second emotion task was determined by thefirst; participants experienced, in counterbalanced order, either Stressed - RelaxedOR Depressed - Excited. Due to the taxing nature of the emotion task, we chose touse only two emotions per participant, determined through discussions with fieldexperts, piloting, and literature review. Since feeling genuine emotion can takeeffort, there was a significant concern about fatigue effects. The condition sets forthe neutralization and emotion tasks are fully summarized in Table 3.2.Gesture TaskTo ensure touch data quality, participants then performed a series of nine touchgestures for 10s per gesture on the touch sensor. This gesture set was chosen fromYohanan et al.’s Social Touch Dictionary [110] and is consistent with the tasks aspublished in Cang et al. [19] using definitions as described in Table 3.3.53Table 3.2: Summary of the condition sets for Neutralization and Emotiontasks. For example, one participant session consisted of R1 neutral task,Stressed emotion task, R2 neutral task, and Relaxed emotion task.Task Description Order N=30Neutral Two neutral texts {R1, R2} (R1, R2) 16(R2, R1) 14Emotion- 4 Emotion labels [83]- Each participant recalled“opposing” emotions- Counter-balanced by orderStressed -Relaxed6Relaxed -Stressed8Depressed- Excited8Excited -De-pressed8Table 3.3: Touch gesture instructions as provided to participants [19].Gesture Definition Providedno touch no contact with the sensor (control)constant touch contact without movementpat quick & gentle touches with the flat ofthe handrub moving the hand to and fro with firmpressurescratch rubbing with the fingertipsstroke moving hand repeatedlytickle touching with light finger movementsDebrief and InterviewFinally, we conducted a short interview and debrief, including a final self-report toensure that the participant felt comfortable and emotionally stable. The latter wasincluded after we found in piloting that participants could become very distraughtduring these sessions. The entire experiment took approximately 60 minutes.54Data Processing and Feature ExtractionTo check against sensor and/or data degradation, touch gesture data collected dur-ing the Gesture Task as described in Table 2.1 was processed identically as thatin [19] where the same touch sensor was attached to a slightly larger robot10.The results of earlier work were consistent with the current gesture data resultsshowing that gestures such as stroke and rub could be differentiated also with thetouch sensor mounted on a smaller robot. Thus, the touch sensor data qualityshould be sufficient for classification of affect-related touch data in the currentstudy.The recorded touch, gaze and biometric data was used to calculate featuresthat could be used in affect classification. We included conventional touch fea-tures [4, 19, 33]: min, max, mean, median, variance, total variance, area underthe curve for both pressure and touch centroid (x,y). Then we extend these fea-tures to gaze focal location (x,y) and biometric channels of blood volume pulse(for heart rate), skin conductance, and respiration rate. As there may be more in-formation than magnitude and direction in touch and gaze behaviour [4], we alsoextracted frequency-domain features, specifically fundamental frequency, ampli-tude, and peak count for touch: frame-level pressure, frame-level centroid (x,y),and the window’s nine most visited cells as traced by the centroid; and for gaze:point of focus on surface (x,y)11.We down-sampled biometric data from the original 2048Hz to match touch andgaze sampling rates at 54Hz–60Hz. All features were then calculated over differentwindow sizes: 2s, 1s, 0.5s, 0.2s, both with and without gaps (See Figure 3.3).Pressure-location Domain Emotional touch feature extraction reprises known anal-ysis procedures for social touch gesture recognition constructing three pressure pa-rameters: {framesum (sum of all taxels in each frame), row-centroid and column-centroid (weighted measurement of centre of mass)} [19, 33]. These parametersare split into windows, where a 2s window contains 2s of data, or 106 frames (54Hz10For our purposes, the larger robot was not feasible, since it had poor sensor coverage on keyparts of its body, and was not able to be easily fixed in relation to the eye tracker.11Biometric channels also included automatically calculated higher-order frequency features thatcame pre-packed with the Thought Technology physiology suite, so further frequency calculationson biometric data was not reasonable.55Figure 3.3: The data recorded during a single emotion task is referred to asa capture. To determine the effect of varying window size on accuracyrates, we tested at 2s, 1s, 0.5s, and 0.2s windows. Assuming that theremay be similarities learned when trained and tested on directly adjacentwindows, we also compared accuracy rates of data with and withoutimposed gaps to remove adjacent window instances.× 2s – Figure 3.3). Seven statistical features (min, max, mean, median, variance,total variance, area under the curve) were calculated for each parameter, for a totalof 21 touch features.Gaze used a total of 34 features: mean, min, max, median, variance of each offocal coordinate pair (X-, Y-location), saccade length, velocity, fixation duration aswell as window summary features of sample count, sample count on robot, sam-ples off robot, sample ratio, within range rate, saccade count, saccade rate, fixationcount, fixation-saccade ratio. We used Salvucci’s I-VT algorithm [87] to differen-tiate between fixations and saccades. Gaze samples with point-to-point velocitieslower than 30 degrees/second were classified as fixations. Samples with velocities56of 30 degrees/second or higher were classified as saccades.Biometric features included mean, median, and variance across all channelsprovided from the Thought Technology physiology suite, including both base sig-nals (specifically BVP, SC, RR), and higher order channels dependent on the origi-nal signals (e.g., , HR, HRV, IBI, etc.), for a total of 228 features across 76 channels.Across all three modalities of touch, gaze, and biometrics, our maximal pressure-location feature set contained a total of 283 statistical features.Frequency Domain As biometric features extend far beyond these simple statistics,we elected to compare frequency-domain features for touch alone and touch andgaze interaction.Inspired by [4]’s use of max amplitude and most activated cell features in thefrequency domain, we extended the set. From the original touch dataset, we sepa-rated each collection instance to various windows and from each window we per-formed a Fast Fourier Transform (FFT) of the frame-level pressure and the centroidcoordinates (x,y). Tracing the centroid, we found each frame-centroid cell as wellas its eight surrounding cells and the FFT for each of these nine cells. From thesecombined 12 parameters, we found six window statistics: the spectral peak count,fundamental frequency (Hz), max amplitude, mean amplitude, variance and totalvariance of amplitude, resulting in a total of 72 touch features in the frequency-domain.We extended the same six window stats to gaze data and on the 2D focal loca-tion, calculating 12 gaze features.Across touch and gaze, our maximal frequency domain feature set thus con-tained 84 statistical features. When we include the 21 touch pressure-location setto assess all touch features, this number increases to 105 statistical features.Classification Consistent with earlier work in touch classification [4, 19, 33], weuse Weka, an open-source machine learning application [41], to run 20-fold cross-validation (CV) using a random forest classifier of 100 trees to report results interms of classification accuracy on both the pressure-location and frequency do-main features. Whether for 20-fold CV or Leave-One-Out, classification accu-racy is defined as ratio of correctly classified instances over all instances. We userandom forest as it has shown to be effective in touch classification tasks in past57Table 3.4: A motivating overview of analysis techniques.(a) 20-fold cross-validationDescription The data is randomly partitioned into 20 equal sizedblocks (folds). One fold is held as the test set, and theother 19 folds are used as the training set. 20 tests arerun, one for each fold, and results are averaged.Implication Since the data is randomly partitioned, allparticipants are likely to be in both the test andtraining sets.Question How can we expect a system with prior knowledge ofparticipants to perform?(b) Leave-one outDescription One full set of samples are left out of the training setand kept as the training set. Since each participantdid two emotions, we left two participants out foreach LOO test. Results are generated per left out pair.Implication Since the data is systematically partitioned, twoparticipants will be completely left out of the trainingset for each run.Question How can we expect a system with no priorknowledge of participants to perform?studies [4, 19, 33, 51].A summary of analysis techniques is reviewed in Table 3.4.Varying window sizes informs real-time classification in case we choose tolimit how much data to collect before a prediction task. Windows for capturinggesture data have been recommended at 2s [33]; however, human hand and fingerexpression is much quicker at roughly 100-200ms [72, 92].Therefore, we divided our data into increasing window sizes of [0.2s, 0.5s,1s, 2s], to provide us with insight into how data length influences classificationperformance. Larger window sizes reduces the interrupts required and thus overallcomputation time. We also assess the influence of removing adjacent windows byadding 2s gaps between classification instances and comparing performance when58all windows are included.In decreasing knowledge levels, we included participant labels, excluded par-ticipant labels, and finally used a version of Leave-One-Out (LOO) where whereinwe trained a classification model on a dataset with test participants removed. Inother words, if we are classifying Participant 3’s data instances, Participant 3 isabsent in the training set and only appears in the test set. To ensure that we cov-ered all four emotions in each run, test sets were comprised of two complementaryparticipants. Since each participant performed two emotions, we had two mutuallyexclusive groups of emotions. Combining one from each group gives us 16 × 14combinations or 224 unique test and training sets for each modality set and win-dow size. We aggregated for modality combination and window size and reportclassification accuracy.The emotion classification tasks varying participant knowledge can be thoughtto represent real-world applications:20-fold CV was done with and without participant labels. Training the modelwith participant labels simulates a system that knows whose emotions it is attempt-ing to classify, i.e. has some a priori knowledge of the user.20-fold CV without participant labels can be read as a simulation of a classi-fication task where the interactive system’s emotion model has been trained on allpossible users before attempting classification, though not given explicit indicationof the current user. Imagine a system that lives in a limited private domain, suchas family or classroom context, where all users have gone through a calibrationperiod. The calibration data would be the training set for the model.In contrast, the Leave-one-out analysis can be read as a simulation of a clas-sification task where the interactive system’s emotion model cannot be trained onall possible users. This could work for a system that lives in a public domain,such as in a museum or institutional context. The training set for the model wouldnecessarily not include all possible users, since they would not be known ahead oftime.We also ran 20-fold CV classifying on participant labels to determine not onlyhow well these feature sets can determine what interaction was performed, but alsowho performed it.59Table 3.5: A motivating overview of analysis conditions.(a) Window sizeDescription Data was all sampled to 54Hz. Window size is thelength of time over which a feature is calculated. E.g.a two-second window has 108 samples.Implication With a static sample speed, shorter windows simulatea system with a faster update cycle. There will beless information per window, but a faster systemresponse.Question How can we expect accuracy rates to change withdifferent sample sizes?(b) Gap/without gapDescription With no gap, all windows are calculatedcontiguously, i.e, every window is directly adjacentto the one previous. With gap, after every window iscalculated, two seconds of data is thrown out.Implication Previous work has shown that a touch gesture takes alittle under a second to make, so a two second gapincreases the likelihood that each window capturesdifferent gestures.Question How does a system trained on morehomo/heterogenous data instances perform?(c) Participant labels in/outDescription With participant in, participant labels were one of thefeatures the system could randomly select.Implication With participant in, the system can tell whoseemotions it is attempting to predict. With participantout, the system still has knowledge of theparticipant’s behaviour, but cannot tell from whomthe behaviour came.Question If the system is calibrated on a set of individuals,does it need a priori identification of an individual topredict emotions well?603.4 ResultsThis section describes our results from running classification using the traditionalpressure-location touch features on a multimodal data set as well as our additionof Frequency Domain features on touch and gaze data. For both feature sets, wefound similar patterns, where:• Increasing number of modalities improved accuracy rates,• Increasing window size had little effect on accuracy rates,• Decreasing amount of participant information in the training set worsenedaccuracy rates,• Leave-one-out analysis performed at or near chance, and• Participant classification performed comparably to emotion classification.In each case, we return to our research questions and describe the feature subsetrun on emotion and participant and the nuances from the result set. We also addressthe performance of feature set analysis, reporting best performers for emotion clas-sification. Finally, we detail participant’s self-report of their personal experiencesduring the emotion expression tasks.3.4.1 Pressure-location Feature Set: Emotion ClassificationPressure-location touch features have performed well in earlier touch gesture recog-nition tasks [19, 33]. To compare the relative classification accuracy with fre-quency features, we report results from random forest classifiers using 20-foldcross-validation (CV) on varying degrees of system knowledge by four differ-ing window sizes. We aggregate across combinations of modalities: touch-only,touch/gaze, touch/biometrics, and touch/gaze/biometrics, reporting on the relativeimpact of each modality set. Biometric classification accuracies are included bothfor reference to accuracies found in past work [57], and as a check against theperformance of touch and gaze features.Table 3.6 is separated into two sets of classification tasks where (1) instanceswere adjacent therefore likely to be similar and (2) instances were separated by 2s61Figure 3.4: Classification performance by modality set, level of systemknowledge (where labels in indicates with knowledge), window size,and with/without gaps. The Emotion row displays results from classify-ing emotion, with and without Participant labels. The Participant rowdisplays results from classifying participant, with and without Emotionlabels. For both cases, increasing the number of modalities improvesaccuracy rates, where the inclusion of biometrics provides the strongestclassification rates. However, in the most rigorous test of emotion classi-fication, Leave-one-out (LOO), all modalities perform at or near chancefor all window sizes. Regardless of classification task, window sizehas a small or positive effect on accuracy rates, except for case of 2s-windows-with-gap. This is likely due to a high decimation of data, i.e.2s-windows-with-gap has 13% of the number of data instances as 1s-windows-with-gap.62Table 3.6: Results from Classifying Emotion using 20 fold CV on pressure-location features for Touch (T), Touch + Gaze (T+G), and Touch + Gaze+ Biometrics (T+G+B)(a) No Gap between instances, participant labels inWindow T T+G T+B T+G+B Count0.2s 92.29 94.41 100 100 738350.5s 92.14 93.94 100 100 299501s 92.13 93.71 100 100 149952s 91.84 93.05 99.96 99.97 7435(b) With 2s Gap between instances, participant labels inWindow T T+G T+B T+G+B Count0.2s 87.13 88.41 99.99 99.97 67130.5s 88.15 89.68 99.93 99.98 59901s 88.56 90.96 99.9 99.92 49992s 76.33 76.48 93.79 93.05 676(c) No Gap between instances, participant labels excludedWindow T T+G T+B T+G+B Count0.2s 75.42 82.2 99.99 100 738350.5s 76.11 82.35 99.99 99.99 299501s 76.77 82.15 99.98 99.97 149952s 76.15 81.45 99.93 99.88 7435(d) With 2s Gap between instances, participant labels excludedWindow T T+G T+B T+G+B Count0.2s 66.75 70.36 99.87 99.82 67130.5s 69.4 73.62 99.77 99.73 59901s 71.93 75.88 99.8 99.76 49992s 56.66 60.06 89.79 90.09 676(e) Leave-participant-out PairsWindow T T+G T+B T+G+B0.2s 30.62 30.04 30.14 29.780.5s 31.95 31.65 27.24 27.861s 32.19 33.08 27.83 27.752s 33.05 34.47 29.14 28.2263to reduce similarity; results show better performance where instances were adja-cent. Gaze and biometric support greatly improves the accuracy rate, in line withprevious work showing high classification performance on physiological data [57].Generally, increasing window sizes provides more information. However, we seea drop-off at 2s. We believe this is due to a reduction in training instances, forexample, in Tables 3.6b and 3.6d, the instance count of 2s windows with gap hasonly ∼13% of the number of instances as the next window.As can be seen in Tables 3.6a–3.6b, we report accuracy rates between 65 and100%, depending on window size and modality combination. Results are presentedin order of decreasing participant information inclusion in the training set i.e., par-ticipant labelled, participants unlabelled, leave-participants-out (LOO).Tables 3.6c–3.6d shows a similar pattern as with participants labelled (above).Note that with lesser participant information, Touch and Touch+Gaze performancedegrades; however, the inclusion of biometric data maintains stable accuracy rates.In our most rigorous test, LOO, high classification rates here would indicategeneralizable emotional expression of the population. However, this is not evidentfrom our feature set under these experimental conditions as, regardless of modality,results from Table 3.6e approach chance (25%).The bargraph of accuracy rates from all Leave-Participants-Out test set is shownin Figure 3.5. Mean rates approach chance (25%), suggesting that the average indi-vidual’s behaviour during emotional tasks is not generalizable. Interestingly, thereare many individuals who may conform to the group or directly oppose the groupas can be seen in the extrema of the distribution. For example, we note that P22 isexceptional in that when P22 is included in the test set with any other participant,emotion classification was consistently well-above chance.3.4.2 Frequency Domain Feature Set: Emotion ClassificationAn earlier study by Altun and MacLean [4] reported promising results using twofrequency domain features (peak frequency and corresponding cell index) over theclassic pressure-location touch features. Here we replicate and extend their featureset over both touch and gaze modalities. Biometric data was not included heresince physiology feature calculations have far more extensive feature sets beyond64Figure 3.5: This is a bargraph of accuracy rates for every LOO test by partic-ipant. Most of the accuracy rates are roughly chance=25%, and low SDsuggests the variance of this classification was low across window sizes.Notice a few interesting outliers: test sets including P22 and P05. P22has much higher classification rates, suggesting that P22 may be similarenough to the group to have their emotion behaviours consistently iden-tified. Conversely, P05’s particularly low rate of classification suggeststhat P05 expresses emotions contrary to the group.these frequency domain calculations [6, 57].We again calculate features across the same four window sizes and comparethe influence of removing adjacent windows. Results are presented based on datafrom Touch Frequency (TF), Touch Frequency/Touch pressure-location (TF+T),and Touch Frequency/Touch pressure-location/Gaze Frequency (TF+T+GF).Repeating the same classification tasks from the pressure-location feature set,we compare how frequency features impact recognition of emotion.With participant labels, we see the highest performance in cases with highsimilarity and high information: adjacent (without gap) instances of multi-modal,large windows. Without participant labels, the loss of adjacent windows (withgap) has a large negative impact on recognition. Table 3.7b shows that even in thebest case (multimodal, largest window), the added gapping results in large accuracyloss. Without participant labels nor test participant data (LOO) results againapproach chance (25%), similar to that of the pressure-location feature set.65Table 3.7: Results from Classifying Emotion using 20 fold CV on Touch Fre-quency (TF); Touch Frequency+Touch pressure-location (TF+T); TouchFrequency+Touch pressure-location+Gaze Frequency (TF+T+GF) fea-ture sets.(a) With no gap between instances, participants inWindow Size TF TF+T TF+T+GF0.2s 78.04 91.83 91.880.5s 86.48 93.57 93.481.0s 81.67 90.58 90.882.0s 92.98 95.79 96.15(b) With 2s Gap between instances, participants inWindow Size TF TF+T TF+T+GF0.2s 70.69 83.7 83.90.5s 83.1 89.64 90.251.0s 76.91 85.74 85.642.0s 82.52 89.07 89.49(c) With no gap between instances, participants outWindow Size TF TF+T TF+T+GF0.2s 78.04 76.68 76.960.5s 86.48 81.9 81.931.0s 81.67 75.35 76.382.0s 92.99 90.32 90.38(d) With 2s Gap between instances, participants outWindow Size TF TF+T TF+T+GF0.2s 42.71 62.36 62.760.5s 67.67 76.07 76.861.0s 51.08 67.32 68.162.0s 58.54 74.17 75.566Figure 3.6: Feature selection popularity of pressure-location features ofTouch and Frequency features of Touch and Gaze.3.4.3 Participant ClassificationThe sharp decline in classification accuracy associated with eliminating participantinformation from training sets in Leave-Out needs further examination. To thisend, we performed 20-fold CV predicting on participant for both feature sets (seeTable 3.8 for pressure-location features and 3.9 for frequency features).High accuracy rates on participant prediction for pressure-location featuressuggests that individual differences are highly expressed in behavioural data ex-amined at this level. We see a greater negative impact on accuracy from removingemotion labels Table 3.8a to 3.8c than removing adjacent instances Table 3.8a to3.8b. However, compared to Section 3.4.3, frequency features perform weakly forpredicting participant.3.4.4 Feature Set AnalysisParticipant information makes a notable difference in classification accuracy ofemotion across all window sizes and adjacency. In order to examine how classictouch features of pressure-location compares to an extended Frequency features on67Table 3.8: Results from Classifying Participant using 20 fold CV on pressure-location features for Touch (T), Touch+Gaze (T+G), Touch+Bio (T+B),and Touch+Gaze+Biometrics (T+G+B)(a) No Gap between instances, emotion labels inWindow Size T T+G T+B T+G+B Count0.2s 84.66 90.48 100 100 738350.5s 85.63 90.94 100 99.99 299501s 86.18 91.24 100 100 149952s 86.24 91.61 100 99.99 7435(b) With 2s Gap between instances, emotion labels inWindow Size T T+G T+B T+G+B Count0.2s 75.57 81.68 99.93 99.96 67130.5s 78.88 84.36 99.97 99.95 59901s 81.5 87.04 99.94 99.98 49992s 67.31 71.3 98.52 98.22 676(c) No Gap between instances, emotion labels excludedWindow Size T T+G T+B T+G+B Count0.2s 71.96 81.98 100 100 738350.5s 73.64 83.28 99.99 99.99 299501s 75.43 84.61 100 100 149952s 75.74 84.87 99.96 100 7435(d) With 2s Gap between instances, emotion labels excludedWindow Size T T+G T+B T+G+B Count0.2s 62.4 70.06 99.88 99.91 67130.5s 66.04 74.77 99.88 99.88 59901s 69.87 77.96 99.9 99.86 49992s 54.59 64.35 98.08 98.22 67668Table 3.9: Results from Classifying Participant using 20 fold CV onTouch Frequency (TF); Touch Frequency+Touch pressure-location(TF+T); Touch Frequency+Touch pressure-location+Gaze Frequency(TF+T+GF) feature sets.(a) With no gap between instances, emotions inWindow TF TF+T TF+T+GF0.2s 53.76 86.8 87.210.5s 71.86 90.15 90.481.0s 64.25 86.36 87.242.0s 87.68 94.82 95.36(b) With 2s Gap between instances, emotions inWindow TF TF+T TF+T+GF0.2s 42.46 75.74 76.410.5s 67.35 84.86 84.881.0s 57.21 79.4 80.992.0s 66.87 84.8 85.65(c) With no gap between instances, emotions outWindow TF TF+T TF+T+GF0.2s 40.67 76.92 77.970.5s 63.37 83.71 84.781.0s 50.42 77.78 79.612.0s 83.31 91.84 92.24(d) With 2s Gap between instances, emotions outWindow TF TF+T TF+T+GF0.2s 31.43 63.45 65.850.5s 60.06 76.98 78.861.0s 44.1 69.12 71.352.0s 53.43 76 78.3869Touch and Gaze, we ran Weka’s Attribute Evaluator using 20-fold CV of the BestFirst search method on the feature set of Touch and Gaze feature set as describedin Section 3.4.2. Note that beyond participant information, the location-medianis the most commonly selected feature, though when normalized by feature countper parameter, pressure-based features are the most popular, followed by location-based features. Frequency-based gaze locations are next most popular. Figure 3.7breaks down each parameter and their relative popularity where each cell representsthe number of features of a statistical type selected at each iteration. The mostoften selected features were the 11 calculated medians of touch location whichwere chosen 100% of the time during 20-fold CV.3.4.5 Experienced Emotion Trajectory and Interview ResultsParticipants provided self-reported emotional state data with respect to Russell’s2D affect grid during two neutralization tasks and following two emotion tasks.Paired t-tests showed no significant difference (p > 0.05) between the neutraliza-tion tasks nor an order effect between the emotion tasks. We therefore ignoreemotion order for subsequent analysis.In paired t-tests, we found significant differences in self-reports between neu-tral and emotion tasks for each of Stressed, Depressed and Excited in both arousaland valence (p < 0.05). The Relaxed task did not show significance. A plot ofemotion trajectory shows participants’ starting state and the movement across the2D affect grid (Figure 3.8).Both high arousal emotions (Excited, Stressed) were consistent with expecta-tions where participants reported a shift in emotion toward the corner of the gridrepresented by the target emotion word. For some participants, the immediacy orrecency of the recalled events really helped to highlight these emotions. As this ex-periment was run towards end of term, this coincided with final exams and holidayreunions, both cited as reasons for ease of recall.“I’m leaving to see my family for the first time in three years, I can’tstop being excited.” – P8“Excited was easy – the situation was more recent and was more im-portant [than my Depressed memory].” – P2270Figure 3.7: Feature selection popularity by statistic. Pressure-based touchfeatures are most popular, followed by location-based touch and gazefrequency features on location data also third, when normalized againsttotal number of features available by modality.71“I have a lot of school assignments right now and I kind of toggledbetween many memories [for Stressed]. It was hard to pick one to feelbut I think that might have added to the feeling.” – P21“...[W]hen I was doing Stressed, I felt like I wanted to punch some-thing it was so gut-wrenching.” – P29The low arousal emotions, Relaxed and Depressed, moved as expected in valencebut not arousal, which remained overall at its neutral “resting” position. In the caseof Relaxed, this might be explained by perceived similarity between this emotiontask and the ‘resting’ start condition.“Relaxed was easy to express because it’s pleasant and I want to feelit and also, I’m sitting on a couch which helps.” – P28, corroboratedby P2, P18For these two emotions, some participants reported that the emotion Depressed waslinked to Stressed in their memories (e.g., , feeling stress about exams was also de-pressing), which may explain some of the unexpected movement in arousal forDepressed. Four participants also reported feelings so strong that their Depressedmemory evoked active tears, while others indicated that these feelings were some-what mitigated by the experience of stroking a soft body.“My [Depressed] memory was very clear and I was able to recall a lotof details. It really helped to be touching a soft thing and felt like itwas taking some of my sadness.” – P29, corroborated by P15, P23Another possibility for both of these emotion targets is that participants were sim-ply unable to turn down their arousal state to this degree during the relatively shorttime of the session.Each participant self-reported a genuineness rating of how authentically theyexperienced the target affect in each emotion task. On a scale of 1–10 with 1being completely contrived or artificial and 10 being completely authentic as inthe original experience, participants rated µ = 8.29 (σ = 1.38) when Depressed;µ = 8 (σ = 1.41) when Excited; µ = 7.5 (σ = 1.68) when Stressed; and µ = 7.5(σ = 1.51) when Relaxed.72Figure 3.8: Changes in individual’s self-report of emotion after Neutraliza-tion (start) and Emotion tasks (finish); N=14 for Stressed & Relaxed andN=16 for Depressed & Excited. Overall, we see a move from the originto the representative quadrant. Stressed and Excited show the strongestoverall change along both Arousal and Valence axes. Relaxed showsthe least change with disconnected points referring to “no change” fromneutral state.733.5 DiscussionIn this section, we return to our research questions and address our suppositionsand research methodology.3.5.1 FindingsClassification accuracy improves with increased modality support – Accepted.The accuracy rates of biometrics in CV suggests that any time biometrics canbe artfully employed, they should be. For example, skin conductance sensors maybe useful for many touch systems. However, with touch and gaze performing atroughly literature accuracies when participant knowledge exists, suggests that emo-tion classification systems based off of just these modalities would be feasible forat least some applications. Due to the invasiveness of some biometric sensors, sucha system has many possible advantages.The biometric signal features were used as a check against touch and gazefeatures. For example, in LOO, biometric features performed just as poorly astouch and gaze features, which suggests a high rate of individual differences in thefeatures we have calculated. While there may be generalizable features availablein biophysical signals, we are not certain that we have found them here.Recognition rates will increase alongside greater participant knowledge – Ac-cepted.Low LOO results imply low generalizability of an individual’s emotional be-haviour at least when touch, gaze, and biometrics are chosen as the nonverbal chan-nels of analysis. Thus, any system that plans to perform emotion classification isadvised to include all expected users in the training pool. As long as users areincluded as part of a calibration period, the system does not require explicit useridentification as evidenced by the relatively strong performance of unlabelled par-ticipants over LOO. In cases where the highest accuracy is needed, the systemshould obtain a priori participant identification as seen in the performance usingparticipant-labelled data.Reducing sample density reduces classification accuracy – Partially accepted.From Figures 3.4 and 3.6, increasing window sizes from 0.2s to 2s improvesclassification under no-gap conditions. However, when we introduced gaps be-74tween instances – where windows were not adjacent – a dropoff in accuracy occursat 2s windows. Upon closer examination, we notice that the added gaps for 2s win-dows also decimated the instance count down more than 90%, from 7435 down to676 instances (see Appendices 3.6b and 3.9b).In general, larger, adjacent windows result in marginally higher classificationaccuracies and can be useful in adjusting parameters to system requirements. Rel-atively minor reduction in accuracy with adjacent versus non-adjacent instancessuggests that the same individual touches similarly in same period of emotion ex-pression. It may be possible to decrease computational load during data collectionas continuous capture may not be necessary when using short window instances.However, seeing how instance count may influence classification performance, werecommend balancing window sizes with capture length such that short windowsare used with systems that require fast response time.Classification accuracy will be improved with the addition of frequency-domainfeatures – Rejected.At 54Hz, the frequency domain features did not provide a significant improve-ment in classification performance on touch data. However, the selection of gazefrequency features (see Figure 3.7) suggests that there may be a benefit for gazedata. We looked into frequency features in part due to others’ success using frequency-based features [4]. However, with our low sample rate coupled with short windows,this does not appear to be salient for this classification task. Furthermore, frequencydomain features require some extra processing by way of Fourier transformationsover the standard pressure-location set prior to model building; therefore, whereprocessing time is a priority, the standard feature set may be preferred.3.5.2 Experimental MethodologyOur experimental methodology reflected our primary imagined use case: a zoomor-phic robot designed for therapy. It had several elements that were not standard, in-cluding the emotion elicitation method, the choice of emotions investigated, studyframing (the setup itself, with the participant interacting with an unresponsive furryobject), and various aspects of the analysis. With results in hand, it is relevant tocritique each of these aspects for its ability to produce valid data in general, and75insights on our research questions in particular.Emotion Elicitation We were surprised at participants’ intensity of expressionduring the emotion elicitation tasks; although the technique was validated by lit-erature, we elicited stronger emotional reactions from our participants than weexpected. The method was shown to be valuable for an experiment run in a labo-ratory where people can otherwise find it hard to act as they would in more naturalsettings. We believe that some variation on recall as a method for eliciting emotioncan be employed in future studies on our robot.Emotion Set Although we have reported both high and low classification accuracyrates, we have some skepticism over whether accuracy rates are a good indicatorof a successful emotion model. There is certainly value in an accurate system, butthere are some underlying assumptions of a discrete classification model that wequestion.Here we assume that participants express a roughly steady state emotion, feltacross the entire memory recall. However, it is possible that strong emotions maybe felt only for an instant before autonomic emotion regulation or coping mecha-nisms take over [39]. The horizon over which we sample a participant’s emotionalstate, and the assumption of immediacy has direct impact on what kind of decisionswe would want an interactive system to implement. Our discrete classification sys-tem can identify differences in minute-long interactions, but cannot provide us withan accurate estimation of an emotional inflection point (i.e. transition from oneemotion to another). A truly interactive system would need to react to the changein an emotional state and adapt over many samples.Furthermore, when engaging in natural emotional conversation, interactionswith pets or friends allow for error correction: if we are mistaken on our first in-terpretation, further context helps us to reassess quickly and correct our language.Working towards an adaptive model (rather than a prescriptive one) would go fur-ther into developing a meaningful relationship over a direct and immediate call-and-response instructing interaction [89]. Using touch data in context with gazeand biometric analysis lays the groundwork for extending haptic human-robot in-76teractions from instructional directives to meaningful conversational relationships.3.5.3 Application Implications and Future WorkWe return to our example applications as a way of grounding our findings and giveconcrete ideas of how they could be deployed for use.Social robot therapy Out of the three studied modalities, touch is the most cen-tral for social robot therapy. Our current findings indicate that as long as the user ispreviously known to the system, distinguishing between four different emotionalstates can be done quite robustly. This provides intriguing opportunities for fur-ther development of therapeutic robots that could automatically run human-affectrecognition and adjust their movement based on it. For example, when the useris touching the robot in a way that communicates stress, the robot could alter itsbreathing behaviour to attempt to calm down the user and reduce possible anxiety.While gaze and biometrics did improve classification, their use in practical sce-narios is more challenging. For robust detection of gaze, the user must always facethe robot at a certain angle or wear a calibrated head-mounted gaze tracker. Sim-ilarly, biometrics requires instrumentation before reliable readings of signals suchas heart rate and skin conductance can be measured. In contrast, touch interactionwith the robot typically consists of momentary touch contact that may too short andinfrequent for measuring biometric signals through sensors embedded in the robot.However, we believe these sensory systems can be easily integrated where someconsiderations are met: careful sensor placement for gaze attention and trainingdata collection sessions.To be used effectively in therapy, an introduction period would be requiredwhere an expert such as a therapist would introduce the robot and guide potentialusers in providing training data for recognition of emotions via touch. Althoughthis does imply a setup cost for use, potential benefits in environments where realanimals cannot be used (such as some hospital environments) compensates for theinitial time investment.77Entertainment As the example with the most controlled environment, embed-ded sensing systems can be integrated into product development, exploiting manymodality combinations. We can imagine handheld video game controllers usingskin conductance and pressure data because the touch contact with a controller islikely more continuous than that with a social robot. On the other hand, the spatialrange of motion in touch gestures sensed with the smaller controller surface maybe more limited; in this case, pressure features may provide richer data than loca-tion. In addition, it could be possible to develop virtual reality (VR) headsets withgaze tracking and BVP sensors at the temples. Prototyping would be possible withany existing controller that can be augmented with current commercially availablesensors.Since the emotional states experienced with games can be different from thosestudied in the current paper, it would be vital to run further studies on employingclassification models for emotional game play. Fortunately, gaming applicationshave existing personalization paradigms that can be leveraged for model building:users already log-on with identifying credentials, providing a priori participant la-bel and in-game tutorial sessions offer an opportunity for collecting and building auser emotion model.Assisted driving Cars are environments where collection of touch, gaze, and bio-metric data could create a sweet spot of low intrusiveness or annoyance, privacy(not obvious to an observer in or out of the car) and accuracy. Sensors for touchlocation, pressure, and skin conductance could be integrated into steering wheels.Seat and seat belt could be utilized for heart rate measurements, while rearviewmirror and windshield edges offer natural locations for gaze trackers.Identification of user would not be an issue in the automotive environmentwhere the number of different people using a particular car is typically low. Per-sonal information consisting of touch, gaze, and biometric data could be stored inkeyfobs. However, acquiring training data would likely take time, and the assis-tive features based on acquired emotional data must be downplayed before enoughdata is collected for reliable classification. A subtle example of utilizing emotionalinformation of the user could be to switch to relaxing music when the driver isdetectably stressed.783.6 ConclusionIn this study, we presented affect classification results from emotionally influencedtouch and gaze behaviours, and checked with biometric data. Across the threemodalities, data was collected via: a custom piezoresistive fabric touch sensor em-bedded in a furry football-sized stationary robot; a Tobii EyeX gaze tracker; andThought Technology’s BioInfiniti Physiology Suite of skin conductance, respira-tory rate, and heart rate variability (by way of Blood Volume Pulse) sensors. Ourdata set is composed of sensory data as well as self-reports of emotion genuine-ness and intensity as participants recalled intense emotional memories spanningRussell’s 2-D arousal-valence affect space, namely Depressed, Excited,Stressed,and Relaxed. For models trained with test participant data using pressure-locationfeatures, the overall emotion recognition rate was roughly 83%, 87%, and 99%for touch, touch + gaze, and touch + gaze + biometrics respectively. Performancedrops steeply when test participants were left out of the training model, resultingin 31%, 31%, and 29%, approaching chance (25%). We tried increasing the fea-ture set by incorporating frequency features for touch and gaze modalities, 79%,85%, and 85% respectively for touch frequency features, frequency and pressure-location touch features, and touch frequency, touch pressure-location, and gazefrequency features combined where LOO performs similarly poorly at 30%, 32%,and 35% respectively.These results inform design of a therapeutic social robot embedded with real-time emotion classification and we make the following recommendations:Emotional behaviour encoded in touch and gaze interaction is sufficient: Whileincluding biometric data greatly improves accuracy, current technology requiresthey be worn rather than embedded, resulting in a more restrained experience.Setup restrictions may interfere with natural emotional expression and sensors af-fixed to the hand and body can feel invasive.A training or calibration phase increases the model’s prediction ability: In-creasing participant information greatly improves the classification model’s predic-tion accuracy. While this stage likely requires guidance from an expert or therapist,the initial training investment facilitates the learning of user-specific characteristicsand develops a more robust user behaviour model, thereby allowing for a person-79alized and productive experience.Sampling density and feature count may be reduced to improve computationload: During real-use cases, the speed of classification and reaction is a seriousconcern. Our findings indicate that interruptions in data collection at up to 2s inter-vals may be tolerable. Conversely, a recognition system that detects high frequencybehaviours could dynamically update sample density.Although we have achieved possibly usable classification rates, our reflectionslead us to believe that a categorical affect model has clear limitations that mustbe addressed. People do not experience emotions in isolation nor discretely, ratheremotional experiences follow a trajectory with distinctive peaks and valleys. Futuredetection systems need to develop models that follow the rise and resolution of anexperience. While this study used a stationary robot, a deployed interactive systemmust acknowledge that its response has influence over user emotional reaction,necessitating dynamic adjustments to behaviour modelling.80Chapter 4Behaviour SketchingPrevious chapters have focused on how a robot senses human affect and intent;now we look to close the interaction loop and consider how a machine can conveyrecognizable emotions to the human user.Robots that interact directly with people will soon become commonplace [29,66], from manufacturing [36] to healthcare and the home [14]. Such machines mustfunction with a degree of social intelligence, and for many applications, render andreact to affect via touch and physical gesture [28, 34].Both the Haptic Creature [108] and CuddleBot [3] were created to study emo-tional touch and its therapeutic benefits. They use exo- or endo-skeletal, vibra-tory, heat or pneumatic elements, and sophisticated signal processing and controlrequiring powerful computation and architecture. Their high expressive poten-tial (via, for example, breathing, purring, hunching, and head movement) requirescomplicated coordination of single element motion. Inspired by research on emo-tional breathing [13, 88], we zeroed in on 1-DOF breathing behaviours in twodistinct robot form factors to discern what factors in motion are emotionally sug-gestive. Originally organized as a case study of periodic breathing, we sketch andhave users (N=20) evaluate behaviours on palm-sized, limbless 1-DOF robots col-lectively dubbed CuddleBits—flexible, furry, and fully-covered FlexiBit, and therigid, wooden, and exposed RibBit (see Figure 4.1).81Figure 4.1: The rigid RibBit (Left) and fur-covered FlexiBit (Right) explorevery different form factors using similar actuation principles and re-quirements. Both can be compressed without damage, allowing for amore naturalistic haptic display.4.1 Related WorkWhile physical form is suggestive of emotional traits, we borrow from other anima-tion methods to suggest anthropomorphism and increase expressivity of inanimateobjects.Animation of emotion: Attribution theory [56] suggests that humans find agencyin many objects and motions, supporting the communicative viability of very sim-ple forms. Conveying emotion on non-humanoid forms has been a mainstay ofvisual animation from its beginnings, illustrated with Disney’s ‘sack of flour’ exer-cise (http://tinyurl.com/pjhwrhg) where artists breathe life into a humble bag [99]. Be-lievable emotion display does not require realistic rendering of the animal or inan-imate object conveying it, only a recognisable anthropomorphic movement [99,106].Emotions and robot believability: To be believable social agents, robots shouldseem capable of emotional processing and expression [11]. Many such robots have82been built in zoomorphic form with encouraging results [67, 86, 101, 107]; butthese forms are much more complex than single degrees-of-freedom.Animation concepts have been successfully applied to physical expression orhuman inference of affective parameters on non-realistic everyday objects, e.g. theambient influence of a stick-like sculpture’s movement on a desk-workers activ-ity [47], and a physically animated phone’s portrayal of emotions spanning Rus-sell’s 2-dimensional affective grid [30, 83] or an expected liveliness [75]. Expres-sive animations produced primarily for touching are less common.Physiological and emotive impact of breathing: Direct physical contact withanother’s breathing motion affects physiology; e.g., in skin-to-skin contact therapyfor premature infants it promotes physiological stabilization [65]. Similar effectsare seen with touch-based social robots [103]. A robot’s felt respiratory motioncan reliably impart a physiologically and subjectively significant calming influ-ence [88].Human breathing behaviours reflect affective state [13], and breathing is anexpressive visual animation tool able to capture drowsiness to distress. The HapticCreature’s breathing display was crucial to being able to convey emotion [109].The present work tests the ability of breath-like motion alone to represent a fullemotional range.4.2 Experiment MethodWe recruited 20 participants (11 male, 8 female, 1 other), aged 20–40 with cul-tural backgrounds from North America, Europe, Southeast Asia, Middle East andAfrica. All participants had completed at least an undergraduate degree and werecompensated $5 for the 30 minute study.Participants were seated and invited to inspect the inanimate robots, then in-structed to use one hand to touch the robot and the other to use the mouse (Fig-ure 4.2). They were given the task of rating each behaviour on a 5-point semanticdifferential (−2 Mismatch to +2 Match) for four situations where the robot wasstressed, excited, depressed, or relaxed (see Figure 4.3). For instance, for “FlexiBitfeels stressed”, a participant would play the behaviour and rate how well it matchedthe robot portraying stress.83Figure 4.2: Experimental setup showing a participant touching the FlexiBitand rating behaviours. The screen’s quadrants present the four situationdescriptions.During playback and rating, hands obscured participants’ view of the robot;motion was experienced largely haptically. Noise-cancelling headphones playedpink noise to mask mechanical noises; instructions were communicated by micro-phone.Ratings for each robot were performed separately. Robot block order was coun-terbalanced, with a 2m rest. For each block, all four emotions were presented onthe same screen so participants could compare globally. Behaviours (15s clips)could be played at will during the block.Order of behaviours and emotion was randomised by participant for the firstrobot. To reduce cognitive load, participants saw the same behaviour/emotion orderfor the second block. In total, each participant performed 64 ratings (8 behaviours× 4 emotions × 2 robots). Each session took ∼30m including a post-experiment84Figure 4.3: Close-up of the interface participants used to rate behaviours.interview.Quantitative data included situation ratings and completion time, estimated byduration of mouse focus within quadrant. In addition to the 64 behaviour ratingsper participant, we also recorded the time it took for each participant to completeratings per situation. This was estimated by adding up the amount of time thatthe mouse cursor was in each of the four quadrants of the interface. We approxi-mated task difficulty with time spent evaluating behaviours, suggesting challengein aligning robot behaviours with emotion.4.3 ResultsWe ran pairwise Wilcoxon signed-rank tests with Bonferroni correction. Ratingsof the two designed behaviours for the same situation showed no significant dif-ferences (α = .050/8 = .006; all p’s ≥ .059). Thus, we averaged ratings into fourpairs by emotion target (e.g., (1) & (2) in Figure 4.4); pairs appear on y-axis of 4.5).The x-axis displays the four emotions. Darker colours indicate higher participantratings, and in an ideal case (where participants think the behaviours match the sit-uations researchers designed them for) the darkest colours appear on the diagonal.85Stressed (high arousal - negative valence) Excited (high arousal - positive valence)Relaxed (low arousal - positive valence)Depressed (low arousal - negative valence)21438765Time (15 seconds)AmplitudeFigure 4.4: Waveforms of behaviours as designed by researchers.The behaviour ratings are grouped based on the situation the behaviour wasdesigned for and the situation for which the behaviours were rated. The ratingsare grouped by the intended representative emotion and the emotional content forwhich the behaviours were rated. Darker colours on the diagonal indicate thatwhere behaviour ratings matched the design intention the behaviours matched thesituations that the behaviours were designed for. For example, it can be seen thatthe designed behaviours for Stressed were rated to be a better match for the excitedsituation.Excited).Effect of situation on behaviour ratings. Friedman’s test on behaviour ratingsshowed significant differences between behaviours per situation for both robots (allp’s < .001). Post hoc analyses using Wilcoxon signed-rank tests were conductedwith a Bonferroni correction (α = .050/6 = .008) to further analyse the effect ofsituation condition on researcher-designed behaviours (Figure 4.6):– Stressed, Excited, or Relaxed: Significant differences between high and lowarousal behaviours (Stressed-Depressed, Stressed-Relaxed, Excited-Depressed andExcited-Relaxed, all p’s ≤ .002). No significant differences between behaviourswith the same arousal level but different valence content.– Depressed: No significant differences between three high and low arousalbehaviour pairs. A significant difference between behaviours with the same arousallevel but different valence content (Stressed-Excited, p ≤ .007).For three of the four situation conditions, participant ratings of behaviours86Figure 4.5: Mean behaviour ratings for FlexiBit grouped by the researcher-designed behaviours (horizontal) and the situation for which the be-haviours were rated by participants (vertical). Researcher-designed be-haviours correspond with (a) to (h) in Fig. 4.4.were decisive (high color contrast between behaviour ratings on Figure 4.5 y-axis).Specifically, by situation condition and researcher-designed behaviours:For situation = Relaxed, Excited, or Stressed, pairwise comparisons betweenall researcher-generated low and high arousal behaviours showed significant dif-ferences in ratings (Depressed-Excited, Depressed-Stressed, Relaxed-Excited andRelaxed-Stressed, all p’s≤ .002). No significant differences were found for ratingsof behaviours with the same arousal level but different valence content: Depressed-Relaxed and Stressed-Excited (p ≥ .017).For situation = Depressed, pairwise comparisons showed significant differ-ences between the following low and high arousal behaviour pairs with both robots:Depressed-Excited and Relaxed-Excited (p’s≤ .001). Also, a significant difference87S E D R S E D RS-E .391 .142 .159 .076 .759 .037 .007 .017S-D .001 .000 .004 .000 .000 .000 .012 .000S-R .001 .000 .014 .001 .001 .002 .032 .001E-D .000 .000 .000 .000 .000 .000 .000 .000E-R .000 .000 .000 .000 .000 .000 .000 .000D-R 1.000 .713 .668 .501 .582 .270 .713 .668FlexiBit RibBitFigure 4.6: Pairwise comparison p-values (Wilcoxon) of behaviours (row)for different situation conditions (col), sig. diff. are darker. NoticeRibBit(S-E, D): in the Depressed condition, Stressed and Excited wererated significantly differently.was found between the low and high arousal pair Depressed-Stressed with FlexiBit(p ≤ .004). With RibBit, a significant difference was found between the positiveand negative valence pair Excited-Stressed (p ≤ .007).This held for Depressed with some exceptions: no significant differences werefound with either robot when comparing ratings of Relaxed and Stressed behaviours(p’s≥ .014). In addition, with RibBit, no significant differences were found for theratings of Depressed and Stressed behaviours (p = .012), however, comparison ofExcited and Stressed revealed a significant difference (p ≤ .007).Effect of robot on behaviour ratings (not significant). Wilcoxon signed-ranktests using Bonferroni correction showed no significant differences in ratings be-tween the two robots (α = .050/16 = .003; all p’s ≥ .026).Duration (not significant). A two-way (2 robots × 4 situations) repeated mea-sures ANOVA showed no significant difference in the time spend on rating be-haviours.4.4 DiscussionWe address our findings with respect to our hypotheses.Hypothesis 1: Different levels of arousal are easier to interpret than different levels88of valence. —AcceptedIn general, participants were able to perceive differences in behaviours de-signed to convey high or low arousal. The main parameter for communicatingarousal variations and most commonly recognized by participants in our behaviourdesign was frequency. Speed or frequency was most mentioned as having com-municated arousal variation: low arousal from low frequency and high arousalfrom high frequency. This confirms that this 1-DOF display is able to reproduceearlier findings [30, 80, 110]. High or low physical activation signals are easilydistinguishable and are good indicators of alertness, evidenced by results whereconsistent arousal states were well-matched.As hypothesised, participants were less able to interpret valence from robotmovement. This has also been a challenge for other physical displays [30, 80].Possible reasons include: ineffective behavioural language for valence polarity(non-periodic, asymmetric signal shapes); breathing as a behaviour might not nat-urally convey valence variations and/or additional DFs are needed to disambiguatethem; materiality played a role (less likely considering consistency between ourtwo prototypes).Unexpectedly, ratings for “depressed” situation diverged significantly.Interviews suggest two reasons: (a) Depressed was being conflated with Stressed.Participants reported experiencing both emotions in concert or as a result of theother. And, (b) breathing (by RibBit in particular) did not have the ability to ex-press depression for some participants. Others simply were not convinced that therobots, and the RibBit in particular, could express a depressed state via breathingbehaviour alone.Suggestions to improve believability and differentiability for Stressed includedsighing and avoidance actions like retreating or turning away. Out of scope for thispaper, this will inform future behavioural and actuation design.Hypothesis 2: FlexiBit’s behaviour will be perceived as conveying more positivevalence than RibBit’s due to its softer and more pleasant feel. —RejectedPost-study interviews revealed that participants reported the movement ex-pressed by the two robot forms as sensorially though not necessarily emotionallydifferent. FlexiBit felt nicer to touch, but the motion was less precise than that89of the RibBit. RibBit’s movements were interpreted as breathing or a heartbeatdespite the exposed inner workings reducing the ’lifelikeness’ of the forms.Unexpectedly, while participants specified preferences for FlexiBit’s fur andRibBit’s motor precision, pairwise comparisons of the same emotions revealed nosignificant difference between robots. Movement rather than materiality dominatedhow participants interpreted emotional expression; although visual access to formwas restricted during movement, tactility might have modulated perception of, e.g.,life-likeness.90Chapter 5Conclusions and Future WorkThis thesis explores one full iteration of the HRI loop in affective touch commu-nication using a custom-built sensor for our CuddleBot family of therapeutic robotpets through three distinct studies, each described in its own chapter. We tested oursensor on classification of gestural touch (N1 = 10, N2 = 16) with results at 79%– 95% accuracy (chance 14%) depending on noise factors, consistent with litera-ture values. Using the same custom touch sensor, we ran affect classification onmultimodal experienced-emotion data (N = 30) with overall accuracy rates of 83%,87%, and 99% accuracy (chance 25%) on touch-only, touch + gaze, and touch +gaze + biometric data respectively where random forest trained models includedtest participants. And finally, as breathing characteristics have been shown to haverecognizable emotional properties [13], we asked participants (N = 20) to labeland rate emotional breathing behaviours on two distinct 1-DOF robots. Resultsshowed that high arousal designs (Stressed and Excited) were significantly recog-nizable over low arousal designs (Depressed and Relaxed); distinctions from neg-ative valence (Stressed and Depressed) to positive valence (Excited and Relaxed)were more difficult.From automatic affect detection to human-recognizable robot affective robotbehaviours, we have demonstrated the feasibility of the full HRI loop. The fol-lowing outlines the impacts of each study and describes future work that builds onfindings from this thesis.915.1 Outcomes and ImpactsEach of the three aforementioned studies forms a chapter of this thesis and is ei-ther a previously published work (Chapter 2 [19]), in preparation for publication(Chapter 3), or part of a larger ongoing study (Chapter 4). In each case, the im-pacts extend beyond this thesis – from informing research directions within the labto fueling interests of the larger community.Gesture Recognition and Touch Sensor:The biggest impact arising from our conference paper [19] reported in Chap-ter 2 is that of the ICMI Grand Challenge. Led by Merel Jung of the University ofTwente in the Netherlands, we published two datasets, one from our work dubbedHuman-Animal Affective Robot Touch (HAART), the other previously publishedand titled the Corpus of Social Touch (CoST) [52]. The challenge posed to thecommunity involved developing a classification model to better understand or im-prove on dataset authors’ work. We received 10 submissions from around the worldwith subject-independent 1results ranging from 35% to 71% accuracy rate usingSupport Vector Machines (SVM), neural nets, and random forest, to name a few(results lower than the subject-dependent values published by us). Of these, fourteams presented papers discussing their techniques and results at ICMI ’15 in Seat-tle WA [53].More locally, we are planning both improvements and full redesigns of ourcustom sensing apparatus. While the fabric sensor has served us well for data col-lection, we can’t help but explore questions on biophysical requirements of pollingrate and resolution. While human control of movement is roughly 5-10Hz [92],human recognition of tactile sensation is degrees of magnitude higher at up to10kHz [72]. So while a robot skin that polls at 54Hz will capture human move-ment, it may yet fall short of human sensory ability. On the other hand, this levelof sensing would overwhelm computational load and it’s not clear whether ratesthis high are even necessary. Current planned studies are aimed at determining the1Subject-independence indicates that classification models know nothing of test participants asmodels are trained and tested on data sets with mutually exclusive sets of participants e.g., P1’sdata appears in either the training set OR the test set but not both. In contrast, subject-dependenttesting refers to training and testing on the same participant where the model may learn somethingof participant behaviour.92required range of resolution and polling rate of tactile sensing skins. By exploringsensing mechanisms, our lab has generated a number of new tactile sensors usingconductive paint, weaving with conductive thread and yarn, as well as silicone-embedded capacitors (developed in collaboration with UBC engineering students).Affect Recognition:A version of Chapter 3’s data collection and work on classification of emotionis targeted for an upcoming journal deadline. In the longer term, it has also in-spired further studies for determining emotional trajectory. We considered steadystate emotions to begin our exploration of how to classify affective states, how-ever, this is only a simplified model of the full emotional experience [17]. Emotionregulation is an important vehicle for coping with negative feelings, a natural buthighly individualistic vehicle that will influence the emotional path [40]. We planan investigation into the artifacts of touch that may help us understand if and whena change occurs, which may allow us to develop robot systems that act as a catalystto hasten or improve that trajectory to more positive-valence emotions.Affective Robot Behaviour:The study of haptically recognizable affective robot behaviours leads us to an-alyze breathing patterns [13], heavily due to it’s calming effects [88]. The findingsin Chapter 4 are part of a larger series of studies that further explore the rangeof affective breathing in by expanding the emotion set both in design and inter-pretation. These behaviours will form a more complete set of complex affectiverobot reactions to human input. In the meantime, the sketching and generation ofthese behaviours have been demonstrated at two venues already: once as a map-ping from gesture to reaction as a demo at ICMI’15 in Seattle USA [18] and thenas a behaviour display vehicle at EuroHaptics’16 in London UK [15].5.2 Future WorkThe robots used in the studies of this thesis range from simple 1-DOF haptic dis-plays of the CuddleBits to the lap-sized 5-DOF modular form of the CuddleBot.Breathing behaviours created on the CuddleBits not only help to form our impres-sions of affective reaction recognition, but also to inform further refinement on thelarger, more mature CuddleBot’s requirements. Beyond breathing mechanisms, we93are also exploring many of the CuddleBot’s degrees of freedom independently; asposture is also highly expressive [26], we target spinal movements that effectivelycreate curling and stretching behaviours to reflect fear or dismay and relaxation orwelcome respectively.Improving affect detection and building a complex set of believable and rec-ognizable affective behaviours begins our expansion of Yohanan’s HRI loop [110].The original interaction loop suggests a naive model wherein human output is as-sessed and categorized, mapping to a robot reaction. However, this is not howwe expect to interact with each other nor how to communicate emotionally withour animals. The most natural affective exchanges follow a more conversationalmodel [16, 89] which makes use of error correction as well as posterior maximumlikelihood calculations to develop smarter behaviour iterations that acknowledgehuman adjustments to displayed behaviours. Future work towards real-time use ofour therapy robot will include the creation of a more robust HRI model detailing aprobabilistic decision process to determine the most appropriate robot reaction tohuman behaviour.94Bibliography[1] R. Adolphs. Neural systems for recognizing emotion. Current opinion inneurobiology, 12(2):169–177, 2002. → pages 38[2] J. Allen and K. E. MacLean. Personal space invaders: Exploringrobot-initated touch-based gestures for collaborative robotics. In ProcACM/IEEE Int’l Conf on Human-Robot Interaction - Extended Abstracts,pages 185–186, 2015. → pages 18[3] J. Allen, L. Cang, M. Phan-Ba, A. Strang, and K. MacLean. Introducingthe cuddlebot: A robot that responds to touch gestures. In Proceedings ofthe Tenth Annual ACM/IEEE International Conference on Human-RobotInteraction Extended Abstracts, pages 295–295. ACM, 2015. → pages 5, 7,13, 81[4] K. Altun and K. E. MacLean. Recognizing affect in human touch of arobot. Pattern Recognition Letters, 66:31–40, 2015. → pages 2, 8, 10, 36,39, 44, 45, 48, 55, 57, 58, 64, 75[5] B. M. Appelhans and L. J. Luecken. Heart rate variability as an index ofregulated emotional responding. Review of general psychology, 10(3):229,2006. → pages 46[6] B. M. Appelhans and L. J. Luecken. Heart rate variability and pain:Associations of two interrelated homeostatic processes. BiologicalPsychology, 77(2):174–182, 2008. ISSN 03010511.doi:10.1016/j.biopsycho.2007.10.004. → pages 46, 65[7] B. D. Argall and A. G. Billard. A survey of tactile human–robotinteractions. Robotics and Autonomous Systems, 58(10):1159–1176, 2010.→ pages 18, 19[8] M. R. Banks and W. A. Banks. The effects of animal-assisted therapy onloneliness in an elderly population in long-term care facilities. The journals95of gerontology series A: biological sciences and medical sciences, 57(7):M428–M432, 2002. → pages 39[9] M. R. Banks, L. M. Willoughby, and W. A. Banks. Animal-assisted therapyand loneliness in nursing homes: use of robotic versus living dogs. JAmerican Medical Directors Assoc, 9(3):173–177, 2008. → pages 18[10] S. B. Barker and K. S. Dawson. The effects of animal-assisted therapy onanxiety ratings of hospitalized psychiatric patients. Psychiatric services,1998. → pages 1, 39[11] J. Bates et al. The role of emotion in believable agents. Communications ofthe ACM, 37(7):122–125, 1994. → pages 82[12] M. Bekoff and J. Goodall. The emotional lives of animals: A leadingscientist explores animal joy, sorrow, and empathy–and why they matter.New world library, 2007. → pages 2[13] S. Bloch, M. Lemeignan, and N. Aguilera-T. Specific respiratory patternsdistinguish among human basic emotions. Intl J Psychophysiology, 11(2):141–154, 1991. → pages 9, 81, 83, 91, 93[14] J. Broekens, M. Heerink, and H. Rosendal. Assistive social robots inelderly care: a review. Gerontechnology, 8(2):94–103, 2009. → pages 81[15] P. Bucci, L. Cang, M. Chun, D. Marino, O. Schneider, H. Seifi, and K. E.MacLean. Cuddlebits: an iterative prototyping platform for complex hapticdisplay. In International Conference on Human Haptic Sensing and TouchEnabled Computer Applications. Springer, 2016. → pages 7, 93[16] B. R. Burleson. The experience and effects of emotional support: What thestudy of cultural and gender differences can tell us about closerelationships, emotion, and interpersonal communication. PersonalRelationships, 10(1):1–23, 2003. → pages 94[17] R. A. Calvo, S. D’Mello, J. Gratch, and A. Kappas. The Oxford handbookof affective computing. Oxford University Press, USA, 2014. → pages 93[18] L. Cang, P. Bucci, and K. E. MacLean. Cuddlebits: Friendly, low-costfurballs that respond to touch. In Proceedings of the 2015 ACM onInternational Conference on Multimodal Interaction, pages 365–366.ACM, 2015. → pages 7, 9396[19] X. L. Cang, P. Bucci, A. Strang, J. Allen, K. MacLean, and H. Liu.Different strokes and different folks: Economical dynamic surface sensingand affect-related touch recognition. In Proceedings of the 2015 ACM onInternational Conference on Multimodal Interaction, pages 147–154.ACM, 2015. → pages viii, 5, 13, 41, 48, 53, 54, 55, 57, 58, 61, 92[20] J. Chang, K. MacLean, and S. Yohanan. Gesture recognition in the hapticcreature. In Eurohaptics, pages 385–391, 2010. → pages 1, 19[21] W.-L. Chang, S. Sˇabanovic, and L. Huber. Use of seal-like robot paro insensory group therapy for older adults with dementia. In Proceedings of the8th ACM/IEEE international conference on Human-robot interaction,pages 101–102. IEEE Press, 2013. → pages 6[22] J. A. Coan and J. J. Allen. Handbook of emotion elicitation andassessment. Oxford university press, 2007. → pages 44, 51[23] J. Cohen. A power primer. Psychological bulletin, 112(1):155, 1992. →pages 27[24] K. M. Cole, A. Gawlinski, N. Steers, and J. Kotlerman. Animal-assistedtherapy in patients hospitalized with heart failure. American J of CriticalCare, 16(6):575–585, 2007. → pages 1, 3, 18[25] C. Conati. Intelligent tutoring systems: New challenges and directions. InIJCAI, volume 9, pages 2–7, 2009. → pages ii, 8[26] M. Coulson. Attributing emotion to static body postures: Recognitionaccuracy, confusions, and viewpoint dependence. Journal of nonverbalbehavior, 28(2):117–139, 2004. → pages 94[27] J. R. Crawford and J. D. Henry. The positive and negative affect schedule(panas): Construct validity, measurement properties and normative data ina large non-clinical sample. British Journal of Clinical Psychology, 43(3):245–265, 2004. → pages 43[28] K. Dautenhahn. Design spaces and niche spaces of believable social robots.In Robot and Human Interactive Communication, 2002. Proceedings. 11thIEEE International Workshop on, pages 192–197. IEEE, 2002. → pages 81[29] K. Dautenhahn. Socially intelligent robots: dimensions of human–robotinteraction. Philosophical Trans of the Royal Society B: BiologicalSciences, 362(1480):679–704, 2007. → pages 8197[30] J. Q. Dawson, O. S. Schneider, J. Ferstay, D. Toker, J. Link, S. Haddad, andK. MacLean. It’s alive!: exploring the design space of a gesturing phone.In Proceedings of Graphics Interface 2013, pages 205–212. CanadianInformation Processing Society, 2013. → pages 83, 89[31] L. L. Di Stasi, D. Contreras, J. J. Canas, A. Ca´ndido, A. Maldonado, andA. Catena. The consequences of unexpected emotional sounds on drivingbehaviour in risky situations. Safety science, 48(10):1463–1468, 2010. →pages 1[32] P. Ekman, R. W. Levenson, and W. V. Friesen. Autonomic nervous systemactivity distinguishes among emotions. Science, 221(4616):1208–1210,1983. → pages 8, 36, 45[33] A. Flagg and K. MacLean. Affective touch gesture recognition for a furryzoomorphic machine. In Proc Intl Conf on Tangible, Embedded andEmbodied Interaction, pages 25–32, 2013. → pages 1, 8, 10, 13, 18, 19,20, 21, 23, 25, 27, 40, 41, 45, 48, 55, 57, 58, 61[34] T. Fong, I. Nourbakhsh, and K. Dautenhahn. A survey of sociallyinteractive robots. Robotics and Autonomous Systems, 42(3):143–166,2003. → pages 38, 81[35] G. J. Gelderblom, R. Bemelmans, N. Spierts, P. Jonker, and L. De Witte.Development of paro interventions for dementia patients in dutchpsycho-geriatric care. In International Conference on Social Robotics,pages 253–258. Springer, 2010. → pages 6[36] B. Gleeson, K. MacLean, A. Haddadi, E. Croft, and J. Alcazar. Gesturesfor industry: intuitive human-robot communication from humanobservation. In Proc. ACM/IEEE Int’l Conf on Human-robot interaction,pages 349–356, 2013. → pages 1, 18, 81[37] P. Good. Permutation tests: a practical guide to resampling methods fortesting hypotheses. Springer Science & Business Media, 2013. → pages 25[38] K. Goris, J. Saldien, I. Vanderniepen, and D. Lefeber. The huggable robotprobo, a multi-disciplinary research platform. In International Conferenceon Research and Education in Robotics, pages 29–41. Springer, 2008. →pages 6[39] J. J. Gross. The emerging field of emotion regulation: an integrative review.Review of general psychology, 2(3):271, 1998. → pages 7698[40] J. J. Gross. Handbook of emotion regulation. Guilford publications, 2013.→ pages 93[41] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H.Witten. The weka data mining software: an update. ACM SIGKDDExplorations Newsletter, 11(1):10–18, 2009. → pages 25, 57[42] J. T. Hancock, K. Gee, K. Ciaccio, and J. M.-H. Lin. I’m sad you’re sad:emotional contagion in cmc. In Proceedings of the 2008 ACM conferenceon Computer supported cooperative work, pages 295–298. ACM, 2008. →pages 43[43] M. J. Hertenstein, D. Keltner, B. App, B. A. Bulleit, and A. R. Jaskolka.Touch communicates distinct emotions. Emotion, 6(3):528, 2006. → pages1, 18, 36, 44, 45, 48[44] M. J. Hertenstein, R. Holmes, M. McCullough, and D. Keltner. Thecommunication of emotion via touch. Emotion, 9(4):566, 2009. → pages18[45] R. Hill. Perceptual attention in virtual humans: Toward realistic andbelievable gaze behaviors. In Proceedings of the AAAI Fall Symposium onSimulating Human Agents, pages 46–52, 2000. → pages 46[46] K. Inoue, K. Wada, and R. Uehara. How effective is robot therapy?: Paroand people with dementia. In Intl Fed for Medical and BiologicalEngineering, pages 784–787, 2012. → pages 1, 5, 6, 7, 10, 18[47] N. Jafarinaimi, J. Forlizzi, A. Hurst, and J. Zimmerman. Breakaway: anambient display designed to change human behavior. In CHI’05 extendedabstracts on Human factors in computing systems, pages 1945–1948.ACM, 2005. → pages 83[48] N. Jaques, C. Conati, J. M. Harley, and R. Azevedo. Predicting affect fromgaze data during interaction with an intelligent tutoring system. InIntelligent Tutoring Systems, pages 29–38. Springer, 2014. → pages 1, 8,36, 39, 41, 46[49] C. M. Jones and T. Troen. Biometric valence and arousal recognition. InProceedings of the 19th Australasian conference on Computer-HumanInteraction: Entertaining User Interfaces, pages 191–194. ACM, 2007. →pages 4699[50] H. Y. Joung and E. Y.-L. Do. Tactile hand gesture recognition throughhaptic feedback for affective online communication. In Universal Access inHuman-Computer Interaction, pages 555–563. Springer, 2011. → pages15, 17, 18, 34[51] M. M. Jung. Towards social touch intelligence: developing a robust systemfor automatic touch recognition. In Proc Intl Conf on MultimodalInteraction, pages 344–348, 2014. → pages 1, 19, 33, 40, 45, 58[52] M. M. Jung, R. Poppe, M. Poel, and D. K. Heylen. Touching thevoid–introducing cost: Corpus of social touch. In Proc Intl Conf onMultimodal Interaction, pages 120–127, 2014. → pages 18, 20, 27, 92[53] M. M. Jung, X. L. Cang, M. Poel, and K. E. MacLean. Touchchallenge’15: Recognizing social touch gestures. In Proceedings of the2015 ACM on International Conference on Multimodal Interaction, pages387–390. ACM, 2015. → pages 92[54] M. Kanamori, M. Suzuki, and M. Tanaka. Maintenance and improvementof quality of life among elderly patients using a pet-type robot. Japanese J.Geriatrics, 39(2):214–218, 2002. → pages 1[55] J. Kangas, J. Rantala, D. Akkil, P. Isokoski, P. Majaranta, and R. Raisamo.Both fingers and head are acceptable in sensing tactile feedback of gazegestures. In International Conference on Human Haptic Sensing and TouchEnabled Computer Applications, pages 99–108. Springer, 2016. → pages49[56] H. H. Kelley. Attribution theory in social psychology. In Nebraskasymposium on motivation. University of Nebraska Press, 1967. → pages 82[57] J. Kim and E. Andre´. Emotion recognition based on physiological changesin music listening. Pattern Analysis and Machine Intelligence, IEEETransactions on, 30(12):2067–2083, 2008. → pages 41, 44, 46, 47, 61, 64,65[58] R. Kohavi et al. A study of cross-validation and bootstrap for accuracyestimation and model selection. In Ijcai, volume 14, pages 1137–1145,1995. → pages 26[59] S. G. Koolagudi, N. Kumar, and K. S. Rao. Speech emotion recognitionusing segmental level prosodic analysis. In Devices and Communications(ICDeCom), 2011 International Conference on, pages 1–5. IEEE, 2011. →pages 36100[60] J. Kortelainen, S. Tiinanen, X. Huang, X. Li, S. Laukka, M. Pietikainen,and T. Seppanen. Multimodal emotion recognition by combiningphysiological signals and facial expressions: a preliminary study. InEngineering in Medicine and Biology Society (EMBC), 2012 AnnualInternational Conference of the IEEE, pages 5238–5241. IEEE, 2012. →pages 36, 46[61] A. Kurauchi, W. Feng, A. Joshi, C. Morimoto, and M. Betke. EyeSwipe. InProceedings of the 2016 CHI Conference on Human Factors in ComputingSystems - CHI ’16, pages 1952–1956, New York, New York, USA, 2016.ACM Press. ISBN 9781450333627. doi:10.1145/2858036.2858335. URLhttp://doi.acm.org/10.1145/2858036.2858335http://dl.acm.org/citation.cfm?doid=2858036.2858335. → pages 49[62] R. W. Levenson. Autonomic nervous system differences among emotions.Psychological science, 3(1):23–27, 1992. → pages 8, 36, 45[63] J. Marescaux, J. Leroy, M. Gagner, F. Rubino, D. Mutter, M. Vix, S. E.Butner, and M. K. Smith. Transatlantic robot-assisted telesurgery. Nature,413(6854):379–380, 2001. → pages 7[64] P. Marti, A. Pollini, A. Rullo, and T. Shibata. Engaging with artificial pets.In Proc. Annual Conf European Assoc Cognitive Ergonomics, pages99–106, 2005. → pages 18[65] A. J. Mitchell, C. Yates, K. Williams, and R. W. Hall. Effects of dailykangaroo care on cardiorespiratory parameters in preterm infants. Journalof neonatal-perinatal medicine, 6(3):243–249, 2013. → pages 83[66] A. Moon, D. M. Troniak, B. Gleeson, M. K. Pan, M. Zheng, B. A. Blumer,K. MacLean, and E. A. Croft. Meet me where i’m gazing: how sharedattention gaze affects human-robot handover timing. In Proceedings of the2014 ACM/IEEE international conference on Human-robot interaction,pages 334–341. ACM, 2014. → pages 81[67] M. C. Moy. Gesture-based interaction with a pet robot. In AAAI/IAAI,pages 628–633, 1999. → pages 83[68] K. Nakajima, Y. Itoh, Y. Hayashi, K. Ikeda, K. Fujita, and T. Onoye.Emoballoon. In Advances in Computer Entertainment, pages 182–197.Springer, 2013. → pages 27101[69] M. Nardelli, G. Valenza, A. Greco, A. Lanata, and E. P. Scilingo.Recognizing emotions induced by affective sounds through heart ratevariability. IEEE Transactions on Affective Computing, 6(4):385–394,2015. doi:10.1109/TAFFC.2015.2432810. → pages 46[70] C. Nguyen and F. Liu. Gaze-based Notetaking for Learning from LectureVideos. In Proceedings of the 2016 CHI Conference on Human Factors inComputing Systems - CHI ’16, pages 2093–2097, New York, New York,USA, 2016. ACM Press. ISBN 9781450333627.doi:10.1145/2858036.2858137. URLhttp://doi.acm.org/10.1145/2858036.2858137http://dl.acm.org/citation.cfm?doid=2858036.2858137. → pages 49[71] J. Odendaal. Animal-assisted therapymagic or medicine? Journal ofpsychosomatic research, 49(4):275–280, 2000. → pages 39[72] M. A. Otaduy and M. C. Lin. High fidelity haptic rendering. SynthesisLectures on Computer Graphics and Animation, 1(1):1–112, 2006. →pages 58, 92[73] B. Pang and L. Lee. Opinion mining and sentiment analysis. Foundationsand trends in information retrieval, 2(1-2):1–135, 2008. → pages 36[74] T. Partala and V. Surakka. Pupil size variation as an indication of affectiveprocessing. Int. J. Hum.-Comput. Stud., 59(1-2):185–198, July 2003. ISSN1071-5819. doi:10.1016/S1071-5819(03)00017-X. URLhttp://dx.doi.org/10.1016/S1071-5819(03)00017-X. → pages 36, 46[75] E. W. Pedersen, S. Subramanian, and K. Hornbæk. Is my phone alive?: alarge-scale study of shape change in handheld devices using videos. InProceedings of the 32nd annual ACM conference on Human factors incomputing systems, pages 2579–2588. ACM, 2014. → pages 83[76] E. Peper, R. Harvey, I.-m. Lin, H. Tylova, and D. Moss. Is There More toBlood Volume Pulse Than Heart Rate Variability , Respiratory SinusArrhythmia , and Cardiorespiratory Synchrony ? Biofedback, 35(2):54–61,2007. ISSN 10815937. → pages 46[77] H. Perner-Wilson and M. Satomi. Diy wearable technology. In ISEA 15thInternational Symposium on Electronic Art, 2009. → pages 13[78] D. Proulx. Animal-assisted therapy. Critical care nurse, 18(2):80, 1998.→ pages 1102[79] P. Rani, C. Liu, N. Sarkar, and E. Vanman. An empirical study of machinelearning techniques for affect recognition in human–robot interaction.Pattern Analysis and Applications, 9(1):58–69, 2006. → pages 18[80] J. Rantala, K. Salminen, R. Raisamo, and V. Surakka. Touch gestures incommunicating emotional intention via vibrotactile stimulation. Intl JHuman-Computer Studies, 71(6):679–690, 2013. → pages 89[81] N. E. Richeson. Effects of animal-assisted therapy on agitated behaviorsand social interactions of older adults with dementia. American journal ofAlzheimer’s disease and other dementias, 18(6):353–358, 2003. → pages39[82] B. Robins, K. Dautenhahn, F. Amirabdollahian, F. Mastrogiovanni, andG. Cannata. Developing skin-based technologies for interactiverobots-challenges in design, development and the possible integration intherapeutic environments. In IEEE ROMAN, pages xxii–xxiii, 2011. →pages 19[83] J. A. Russell, A. Weiss, and G. A. Mendelsohn. Affect grid: a single-itemscale of pleasure and arousal. Journal of personality and social psychology,57(3):493, 1989. → pages ii, 43, 51, 54, 83[84] J. Sabourin, B. Mott, and J. C. Lester. Modeling learner affect withtheoretically grounded dynamic bayesian networks. In Affective computingand intelligent interaction, pages 286–295. Springer, 2011. → pages 44[85] J. Saldien, K. Goris, S. Yilmazyildiz, W. Verhelst, and D. Lefeber. On thedesign of the huggable robot probo. J. Physical Agents, 2(2):3–12, 2008.→ pages 18[86] J. Saldien, K. Goris, B. Vanderborght, J. Vanderfaeillie, and D. Lefeber.Expressing emotions with the social robot probo. Intl J of Social Robotics,2(4):377–389, 2010. → pages 5, 6, 10, 83[87] D. D. Salvucci and J. H. Goldberg. Identifying fixations and saccades ineye-tracking protocols. In Proceedings of the symposium on Eye trackingresearch & applications - ETRA ’00, pages 71–78, New York, New York,USA, 2000. ACM Press. ISBN 1581132808. doi:10.1145/355017.355028.URL http://portal.acm.org/citation.cfm?doid=355017.355028. → pages 56[88] Y. Sefidgar, K. E. MacLean, S. Yohanan, M. Van der Loos, E. A. Croft, andJ. Garland. Design and evaluation of a touch-centered calming interaction103with a social robot. Trans Affective Computing, PP(99), 2015. → pages ii,6, 9, 17, 39, 46, 81, 83, 93[89] H. Sharp. Interaction design. John Wiley & Sons, 2003. → pages ii, 9, 76,94[90] T. Shibata. Ubiquitous surface tactile sensor. In Robotics and Automation,2004. TExCRA’04. First IEEE Technical Exhibition Based Conference on,pages 5–6. IEEE, 2004. → pages 5, 6, 7[91] T. Shibata and K. Wada. Robot therapy: a new approach for mentalhealthcare of the elderly-a mini-review. Gerontology, 57(4):378–386,2010. → pages 18[92] K. B. Shimoga. Finger force and touch feedback issues in dexteroustelemanipulation. In Proc Intelligent Robotic Systems for SpaceExploration, pages 159–178, 1992. → pages 10, 17, 58, 92[93] D. Silvera-Tawil, D. Rye, and M. Velonaki. Interpretation of social touchon an artificial arm covered with an eit-based sensitive skin. Intl J of SocialRobotics, 6(4):489–505, 2014. → pages 1, 18, 48[94] W. D. Stiehl, J. Lieberman, C. Breazeal, L. Basel, L. Lalla, and M. Wolf.Design of a therapeutic robotic companion for relational, affective touch.In ROMAN Workshop on Robot & Human Interactive Communication,pages 408–415, 2005. → pages 5, 18[95] W. D. Stiehl, C. Breazeal, K.-H. Han, J. Lieberman, L. Lalla, A. Maymin,J. Salinas, D. Fuentes, R. Toscano, C. H. Tong, et al. The huggable: atherapeutic robotic companion for relational, affective touch. In ACMSIGGRAPH 2006 emerging technologies, page 15. ACM, 2006. → pages5, 38[96] W. D. Stiehl, J. K. Lee, C. Breazeal, M. Nalin, A. Morandi, and A. Sanna.The huggable: a platform for research in robotic companions for pediatriccare. In Proceedings of the 8th International Conference on interactionDesign and Children, pages 317–320. ACM, 2009. → pages 6, 10[97] T. Tamura, S. Yonemitsu, A. Itoh, D. Oikawa, A. Kawakami, Y. Higashi,T. Fujimooto, and K. Nakajima. Is an entertainment robot useful in the careof elderly people with severe dementia? J Gerontology Series A: BiologicalSciences and Medical Sciences, 59(1):M83–M85, 2004. → pages 18104[98] D. S. Tawil, D. Rye, and M. Velonaki. Touch modality interpretation for aneit-based sensitive skin. In IEEE Conf on Robotics and Automation (ICRA),pages 3770–3776, 2011. → pages 27[99] F. Thomas and O. Johnston. Disney animation: The illusion of life.Abbeville Press, 1981. → pages 82[100] J. Ulmen and M. Cutkosky. A robust, low-cost and low-noise artificial skinfor human-friendly robots. In Proc. IEEE Robotics and Automation(ICRA), pages 4836–4841, 2010. → pages 19[101] J. D. Velasquez. An emotion-based approach to robotics. In IntelligentRobots and Systems, 1999. IROS’99. Proceedings. 1999 IEEE/RSJInternational Conference on, volume 1, pages 235–240. IEEE, 1999. →pages 83[102] K. Wada and T. Shibata. Living with seal robotsits sociopsychological andphysiological influences on the elderly at a care house. Robotics, IEEETransactions on, 23(5):972–980, 2007. → pages 38[103] K. Wada, T. Shibata, T. Saito, K. Sakamoto, and K. Tanie. Psychologicaland social effects of one year robot assisted activity on elderly people at ahealth service facility for the aged. In Robotics and Automation, 2005,pages 2785–2790. IEEE, 2005. → pages 1, 83[104] K. Wada, Y. Ikeda, K. Inoue, and R. Uehara. Development and preliminaryevaluation of a caregiver’s manual for robot therapy using the therapeuticseal robot paro. In ROMAN, pages 533–538, 2010. → pages ii, 6, 18[105] D. Watson, L. A. Clark, and A. Tellegen. Development and validation ofbrief measures of positive and negative affect: the panas scales. Journal ofpersonality and social psychology, 54(6):1063, 1988. → pages 43[106] P. Wells. The animated bestiary: animals, cartoons, and culture. RutgersUniversity Press, 2008. → pages 82[107] S. Yohanan and K. MacLean. The haptic creature project: Socialhuman-robot interaction through affective touch. In Proc AISB 2008,volume 1, pages 7–11. Citeseer, 2008. → pages 83[108] S. Yohanan and K. E. MacLean. A tool to study affective touch. In CHI’09Extended Abstracts on Human Factors in Computing Systems, pages4153–4158. ACM, 2009. → pages 5, 6, 81105[109] S. Yohanan and K. E. MacLean. Design and assessment of the hapticcreature’s affect display. In Proceedings of the 6th international conferenceon Human-robot interaction, pages 473–480. ACM, 2011. → pages 13, 83[110] S. Yohanan and K. E. MacLean. The role of affective touch in human-robotinteraction: Human intent and expectations in touching the haptic creature.International Journal of Social Robotics, 4(2):163–180, 2012. → pages ii,x, 1, 2, 15, 19, 21, 38, 44, 53, 89, 94106Appendix ASupporting MaterialsA.1 Study FormsA.1.1 Consent Form1079HUVLRQ -XQH 3DJHRI678'<&216(17)250 'HSDUWPHQWRI&RPSXWHU6FLHQFH0DLQ0DOO9DQFRXYHU%&&DQDGD97=WHOID[3URMHFW7LWOH ,QYHVWLJDWLRQRI,QWHUDFWLYH$IIHFWLYH7RXFK3ULQFLSDO,QYHVWLJDWRU .DURQ0DF/HDQ 3URIHVVRU 'HSW RI&RPSXWHU6FLHQFH &R,QYHVWLJDWRU ;L/DXUD&DQJ06F 6WXGHQW'HSWRI&RPSXWHU6FLHQFH 7KH SXUSRVH RI WKLV VWXG\ LV WR JDWKHU IHHGEDFN WR LQIRUP WKH LQWHUDFWLRQ GHVLJQ RI DQ KDSWLF]RRPRUSKLFURERW:HPD\DVN\RXWRLQWHUDFWZLWKRQHRUPRUHWRXFKVHQVLWLYHVXUIDFHV PRXQWHGRQDYDULHW\ RI VWDWLRQDU\ DQGRU PRYLQJ REMHFWV :HPD\ DOVR DVN \RX WR LQWHUDFWZLWK DQ LQWHUIDFH IRUFRQWUROOLQJ VPDOO URERWV DQG DVN \RX WR FUHDWH PDQLSXODWH RU GHVFULEH WKH PRWLRQV DQG SHUFHLYHGHPRWLRQDOFRQWHQW :HPD\DVN\RXWRWDONDERXW\RXUH[SHULHQFHVZLWKDQLPDOVDQGSHWV7KLVVWXG\LVSDUWRIDJUDGXDWHVWXGHQWUHVHDUFKSURMHFW<RXPD\UHIXVHRUVNLSDQ\WDVNRUTXHVWLRQZLWKRXWDIIHFWLQJ\RXUUHLPEXUVHPHQW5(,0%856(0(17 :HDUHYHU\JUDWHIXOIRU\RXUSDUWLFLSDWLRQ<RXZLOOUHFHLYHPRQHWDU\FRPSHQVDWLRQRIIRUWKLVVHVVLRQ7,0(&200,70(17 îKRXU VHVVLRQ5,6.6	%(1(),76 7KLVH[SHULPHQWFRQWDLQVQRPRUHULVNWKDQHYHU\GD\FRPSXWHUXVH RUFRPPHUFLDOO\DYDLODEOHDFWXDWHGWR\V7KHUHDUHQRGLUHFWEHQHILWVWRSDUWLFLSDQWVEH\RQGFRPSHQVDWLRQ&21),'(17,$/,7< <RXZLOOQRWEHLGHQWLILHGE\QDPHLQDQ\VWXG\UHSRUWV$Q\LGHQWLILDEOHGDWDJDWKHUHGIURPWKLVH[SHULPHQWZLOOEHVWRUHGLQDVHFXUH&RPSXWHU6FLHQFHDFFRXQWDFFHVVLEOHRQO\WRWKHH[SHULPHQWHUV 9LGHRRUDXGLRH[FHUSWVZLOOEHHGLWHGWRUHPRYH LGHQWLI\LQJLQIRUPDWLRQ LQFOXGLQJEXWQRWOLPLWHGWRREVFXULQJIDFHDQGRUYRLFHDQGZLOOQRWEHXVHGLQSXEOLFDWLRQXQOHVVSHUPLVVLRQLV H[SOLFLWO\JLYHQEHORZ$8',29,'(25(/($6( <RXPD\EHDVNHGIRUDXGLRRUYLGHRWREHUHFRUGHGGXULQJWKLVVHVVLRQ<RXDUHIUHHWRVD\QRZLWKRXWDIIHFWLQJ\RXUUHLPEXUVHPHQW,DJUHHWRKDYH$8',2UHFRUGHG տ <HVտ 1R,DJUHHWRKDYH9,'(2UHFRUGHG տ <HVտ 1R,DJUHHWRKDYH$121<0,=('9,'(225$8',2(;&(5376SUHVHQWHG LQSXEOLFDWLRQV տ <HVտ 1R<RX XQGHUVWDQG WKDW WKH H[SHULPHQWHU ZLOO $16:(5 $1< 48(67,216 \RX KDYH DERXW WKHLQVWUXFWLRQVRUWKHSURFHGXUHVRIWKLVVWXG\$IWHUSDUWLFLSDWLQJWKHH[SHULPHQWHUZLOODQVZHUDQ\RWKHUTXHVWLRQV\RXKDYHDERXWWKLVVWXG\ <RXUSDUWLFLSDWLRQLQWKLVVWXG\LVHQWLUHO\YROXQWDU\DQG\RXPD\UHIXVH WR SDUWLFLSDWH RUZLWKGUDZ IURP WKH VWXG\ DW DQ\ WLPHZLWKRXW MHRSDUG\ <RXU VLJQDWXUHEHORZLQGLFDWHVWKDW\RXKDYHUHFHLYHGDFRS\RIWKLVFRQVHQWIRUPIRU\RXURZQUHFRUGVDQGFRQVHQWWRSDUWLFLSDWHLQWKLVVWXG\ $Q\TXHVWLRQVDERXWWKHVWXG\FDQEH GLUHFWHGWR/DXUD&DQJFDQJ#FVXEFFD,I \RX KDYH DQ\ FRQFHUQV RU FRPSODLQWV DERXW \RXU ULJKWV DV D UHVHDUFK SDUWLFLSDQW DQGRU \RXUH[SHULHQFHVZKLOH SDUWLFLSDWLQJ LQ WKLV VWXG\ FRQWDFW WKH5HVHDUFK 3DUWLFLSDQW &RPSODLQW /LQH LQ WKH1089HUVLRQ -XQH 3DJHRI8%&2IILFHRI5HVHDUFK(WKLFVDWRULIORQJGLVWDQFHHPDLO56,/#RUVXEFFDRUFDOOWROOIUHH<RXKHUHE\&216(17WRSDUWLFLSDWHDQGDFNQRZOHGJH5(&(,37RIDFRS\RIWKHFRQVHQWIRUP35,17('1$0(BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB'$7(BBBBBBBBBBBBBBBBBBBBBBBBBBBB6,*1$785(BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB109A.1.2 Call for Participation Form110T H E  U N I V E R S I T Y  O F  B R I T I S H  C O L U M B I A    Department of Computer Science 201-2366 Main Mall Vancouver, B.C.  Canada  V6T 1Z4 tel:   (604) 822-3061 fax:   (604) 822-4231   Investigation of Interactive Affective Touch Principal Investigator: Karon MacLean, Professor, Dept. of Computer Science, 604-822-8169 Co-Investigator: Laura Cang, MSc Student, Dept. of Computer Science, 604-827-3982 Version 2.0 / June 6, 2016  The following message will be used to recruit participants for our study. We will distribute this message using some or all of the following methods:  Emailing the recruitment message to mailing lists maintained by the Computer Science department or our research group, such as a list of department graduate students (often used for this kind of purpose) and a list of persons who have expressed an interest in being study participants.  Uploading the recruitment message as an online posting, on craigslist.ca or facebook.  Physical postings in public areas.  Email and word-of-mouth when conducting purposeful sampling.  From: Laura Cang Subject: Call for Study Participants - $10 for Interactive Affective Touch   The Sensory, Perception, and Interaction (SPIN) Research Group in the UBC Dept. of Computer Science is looking for participants for a study investigating the sensing, design, and interpretation of emotive interactions with a small furry robot and/or other household objects. You will be compensated $10 for your participation in a single 1-hour session.  We may ask you to talk about your experiences with household pets and other animals as well as other emotion-rich stories or memories. We may ask you to interact with touch sensitive surfaces or one or more robots that may produce any number of sounds, motions, and/or vibrations. We may also ask you to interact with a device for controlling these robots, and ask you to create, manipulate, or describe haptic (touch) sensations. Your touch and eye-gaze interactions may be recorded.  Please visit <URL> or contact me to sign-up for the study. You may also contact me if you have any questions.  Laura Cang MSc Student, UBC Computer Science cang@cs.ubc.ca   111A.2 Participant Response FormsA.2.1 Gesture Study Demographic Questionnaire112113114A.2.2 Affective Rating Form1151. How real or genuine was the emotion you experienced when telling the story?   0               1               2               3               4               5               6               7               8               9               10 (totally artificial)                                          (moderately real)                                    (completely genuine)      2. How did you feel when telling the story? Please check the corresponding box below.   116A.2.3 Robot Behaviour Interview Script117Version 0.1 / 13 October, 2015 / Page 1 of 1 PARTICIPANT ID: _________________      DATE: ____________________________  Semi-Structured Interview Script  Experimenter: Thank you for participating in our study. We would like to ask a few questions about your impressions of the robot display. If you require clarification or are uncomfortable for any reason, feel free to interrupt at any time.  Form-factor Impressions:  1.   What are your initial impressions of each robot? How would you describe these forms (e.g., machine, animal, cartoon, …)?   2.   Where might they have come from?   3.   What does it mean when they are not moving?   4.   How you might use these robots?   5.   Which robot seems to like you more and why?   Comparative emotional clarity of the robot displays:   6. Do you think there were differences in how the robots were able to express their feelings?   7. Did one seem more dramatic than the other?   8. Which robot would you pick for expressing each emotion? Stressed   Excited   Depressed   Relaxed   Pet Preferences:  9.  Have you had any experience with household pets or other domesticated animals?  - If yes, tell me about your pet or animal.   - If not, do you want a pet, why or why not?   10. Do you have any questions or comments for us? 118


Citation Scheme:


Citations by CSL (citeproc-js)

Usage Statistics



Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            async >
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:


Related Items