Open Collections

UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Dialogue act recognition in synchronous and asynchronous conversations Maryam, Tavafi 2013

Your browser doesn't seem to have a PDF viewer, please download the PDF to view this item.

Item Metadata

Download

Media
24-ubc_2013_fall_tavafi_maryam.pdf [ 1.18MB ]
Metadata
JSON: 24-1.0052186.json
JSON-LD: 24-1.0052186-ld.json
RDF/XML (Pretty): 24-1.0052186-rdf.xml
RDF/JSON: 24-1.0052186-rdf.json
Turtle: 24-1.0052186-turtle.txt
N-Triples: 24-1.0052186-rdf-ntriples.txt
Original Record: 24-1.0052186-source.json
Full Text
24-1.0052186-fulltext.txt
Citation
24-1.0052186.ris

Full Text

Dialogue Act Recognition in Synchronousand Asynchronous ConversationsbyMaryam TavafiB.Sc., University of Tehran, 2011A THESIS SUBMITTED IN PARTIAL FULFILLMENT OFTHE REQUIREMENTS FOR THE DEGREE OFMASTER OF SCIENCEinThe Faculty of Graduate and Postdoctoral Studies(Computer Science)THE UNIVERSITY OF BRITISH COLUMBIA(Vancouver)August 2013cMaryam Tavafi 2013AbstractThis thesis presents a domain-independent approach for the task of dialogue actmodeling across a comprehensive set of different spoken and written conversationsincluding: emails, forums, meetings, and phone conversations. We begin by in-vestigating the performance of unsupervised methods for the task of dialogue actrecognition. The low performance of these techniques gives us a motivation totackle this problem in supervised and semi-supervised manners.To this aim, we propose a domain-independent feature set for the task of dia-logue act modeling on different spoken and written conversations. Then, we com-pare the results of SVM-multiclass and two structured predictors namely SVM-hmm and CRF algorithms for supervised dialogue act modeling. We then providean in-depth analysis about the effectiveness of proposed domain-independent dia-logue act modeling approaches in different written and spoken conversations.Extensive empirical results, across different conversational modalities, demon-strate the effectiveness of our SVM-hmm model for dialogue act recognition inconversations. Furthermore, we use the SVM-hmm algorithm to investigate theeffectiveness of using unlabeled data in a semi-supervised dialogue act recognitionframework.iiPrefaceA version of this thesis has been published at SIGDIAL?13[40]. The paper is a sum-mary of all the chapters of this thesis. I conducted all the experiments and wrotemost of the manuscript for the published paper. Giuseppe Carenini, Raymond Ng,Yashar Mehdad and Shafiq Joty were the supervisory authors on this project andwere involved throughout the project in concept formation and manuscript edits.iiiTable of ContentsAbstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiPreface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiiTable of Contents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ivList of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viList of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiList of Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viiiList of Abbreviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . ixAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xDedication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Background and Related Work . . . . . . . . . . . . . . . . . . . . 32.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.1.1 Dialogue Acts . . . . . . . . . . . . . . . . . . . . . . . 32.1.2 Synchronous vs. Asynchronous Conversations . . . . . . 62.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2.1 Unsupervised Dialogue Act Recognition . . . . . . . . . 92.2.2 Supervised Dialogue Act Recognition . . . . . . . . . . . 122.2.3 Semi-supervised Dialogue Act Recognition . . . . . . . . 173 Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.1 Corpora . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.1.1 Email Conversations . . . . . . . . . . . . . . . . . . . . 203.1.2 Forum Conversations . . . . . . . . . . . . . . . . . . . 21ivTable of Contents3.1.3 Meeting Conversations . . . . . . . . . . . . . . . . . . 233.1.4 Phone Conversations . . . . . . . . . . . . . . . . . . . . 243.2 Conversational Structure . . . . . . . . . . . . . . . . . . . . . . 283.2.1 Fragment Quotation Graph . . . . . . . . . . . . . . . . 284 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 Unsupervised Dialogue Act Recognition . . . . . . . . . . . . . 334.2.1 Hidden Markov Model . . . . . . . . . . . . . . . . . . . 334.2.2 Mixed Membership Markov Model . . . . . . . . . . . . 354.3 Supervised Dialogue Act Recognition . . . . . . . . . . . . . . . 374.3.1 SVM-hmm . . . . . . . . . . . . . . . . . . . . . . . . . 374.3.2 SVM-multiclass . . . . . . . . . . . . . . . . . . . . . . 384.3.3 Conditional Random Fields . . . . . . . . . . . . . . . . 384.4 Semi-supervised Dialogue Act Recognition . . . . . . . . . . . . 394.4.1 Bootstrapping . . . . . . . . . . . . . . . . . . . . . . . 405 Experiments and Results . . . . . . . . . . . . . . . . . . . . . . . . 425.1 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . 425.1.1 Micro-average . . . . . . . . . . . . . . . . . . . . . . . 425.1.2 Macro-average . . . . . . . . . . . . . . . . . . . . . . . 435.1.3 Perplexity . . . . . . . . . . . . . . . . . . . . . . . . . 435.1.4 Variation of Information . . . . . . . . . . . . . . . . . . 445.2 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . 445.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.3.1 Unsupervised Dialogue Act Recognition . . . . . . . . . 455.3.2 Supervised Dialogue Act Recognition . . . . . . . . . . . 465.3.3 Semi-supervised Dialogue Act Recognition . . . . . . . . 585.4 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 586 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . . 62Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63vList of Tables2.1 Unsupervised dialogue act recognition related works. . . . . . . . 112.2 Supervised dialogue act recognition related works. . . . . . . . . 172.3 Semi-supervised dialogue act recognition related works. . . . . . 193.1 BC3 tagset. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.2 Reduced MRDA tagset. . . . . . . . . . . . . . . . . . . . . . . . 233.3 Reduced SWBD tagset. . . . . . . . . . . . . . . . . . . . . . . . 243.4 Dialogue act categories and their relative frequency in all the la-beled corpora. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.1 Instance labeled with L. . . . . . . . . . . . . . . . . . . . . . . . 425.2 Results of unsupervised dialogue act modeling using HiddenMarkovModel on three datasets of BC3, CNET and TripAdvisor. . . . . . 455.3 Results of unsupervised dialogue act modeling using Mixed Mem-bership Markov Model on two datasets of CNET and TripAdvisor. 465.4 Results of supervised dialogue act modeling; columns are macro-average and micro-average accuracy. . . . . . . . . . . . . . . . . 475.5 Results of supervised dialogue act modeling; columns are the im-provement of macro-average and micro-average compared to thebaseline. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 475.6 Results of supervised dialogue act modeling on BC3 using SVM-hmm. The tagset was reduced to 5 dialogue tags. The baselineaccuracy is 69.56%, which shows the majority class. . . . . . . . 515.7 Micro-average of supervised dialogue act modeling using SVM-hmm. The most discriminative bigrams are added as the features. . 545.8 Sample bigrams extracted from the first five words of the sentenceby the mutual information method. . . . . . . . . . . . . . . . . . 555.9 Sample bigrams extracted from the last five words of the sentenceby the mutual information method. . . . . . . . . . . . . . . . . . 575.10 Accuracy of semi-supervised dialogue act modeling. . . . . . . . 585.11 SVM-hmm accuracy for different dialogue acts. . . . . . . . . . . 59viList of Figures2.1 Sample email thread from BC3. . . . . . . . . . . . . . . . . . . 42.2 Sample phone conversation from SWBD. . . . . . . . . . . . . . 62.3 Sample email conversation from BC3. . . . . . . . . . . . . . . . 83.1 Email conversations and their corresponding Fragment QuotationGraph. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.1 The graphical model for Hidden Markov Model. . . . . . . . . . . 344.2 The graphical models for HMM, LDA and M4.[31] . . . . . . . . 364.3 The graphical model for Conditional Random Fields. . . . . . . . 394.4 Bootstrapping. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.1 Comparing SVM-hmm and CRF on accuracy of each of the 12classes of BC3. . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2 Comparing SVM-hmm and CRF with confusion matrices. . . . . 505.3 SVM-hmm accuracy on BC3 with 5 dialogue acts. . . . . . . . . . 525.4 Confusion matrix of supervised dialogue act modeling on BC3 us-ing SVM-hmm. The tagset was reduced to 5 tags. . . . . . . . . . 53viiList of Equations4.1 Mutual Information 4.1 . . . . . . . . . . . . . . . . . . . . . . . . . . 324.2 Markov assumption. 4.2 . . . . . . . . . . . . . . . . . . . . . . . . . 344.3 Viterbi algorithm 4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.4 SVM-hmm learn 4.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . 384.5 SVM-hmm classify 4.5 . . . . . . . . . . . . . . . . . . . . . . . . . . 384.6 Optimization equation of SVM-multiclass.4.6 . . . . . . . . . . . . . . 384.7 Loss function for SVM-multiclass.4.7 . . . . . . . . . . . . . . . . . . 384.8 Conditional Random Fields. 4.8 . . . . . . . . . . . . . . . . . . . . . 394.9 Viterbi decoding for CRF. 4.9 . . . . . . . . . . . . . . . . . . . . . . . 395.1 Precision 5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.2 Micro-average 5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.3 Macro-average 5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.4 Perplexity 5.4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 435.5 Perplexity per Viterbi-tagged sentence 5.5 . . . . . . . . . . . . . . . . 435.6 Variation of Information 5.6 . . . . . . . . . . . . . . . . . . . . . . . 44viiiList of AbbreviationsBOW Bag of Words. 10CRF Conditional Random Fields. 2DA Dialogue Act. 29DT Decision Tree. 14EM Expectation Maximization. 9FQG Fragment Quotation Graph. 9HMM Hidden Markov Model. 1HMM+Mix Hidden Markov Model-Mixture Model. 10IG Information Gain. 13LDA Latent Dirichlet Allocation. 9M4 Mixed Membership Markov Model. 1MI Mutual Information. 30POS Part of Speech. 10SVM Support Vector Machines. 35SVM-hmm Support Vector Machines-Hidden Markove Model. 2SVM-multiclass Support Vector Machines-multiclassifier. 1TF Term Frequency. 13TFIDF Term Frequency Inverse Document Frequency. 13ixAcknowledgementsI would like to offer my gratitude to my advisers, Dr. Raymond T. Ng and Dr.Giuseppe Carenini, whose knowledge, support and encouragement allowed me todevelop and pursue this thesis.I would also like to express my gratitude to Dr. Yashar Mehdad and Dr. ShafiqJoty, for their help and guidance.Last but not least, I would like to thank my family for their unconditional love,encouragement and support during all my studies.xTo my family.xiChapter 1IntroductionA conversation is a joint activity between two or more people, which is formed bytaking turns and expressing their own ideas. In each turn, the speaker states hisopinion by performing a dialogue act (e.g. question). Then the listener confirmsthat he has understood the speaker?s intended meaning by stating another dialogueact (e.g. answer). In chapter 2, we are going to see more examples of dialogueacts.Dialogue act recognition is a process of detecting the hidden dialogue act ofeach sentence of a conversation. Due to the sequential nature of discussions, thedialogue acts of a conversation are dependent. If the first speaker asks a question,the second speaker is going to answer his question in the next turn. We should takeinto account these dependencies for detecting the dialogue acts.Spoken and written domains are also called as synchronous and asynchronousconversations. In spoken discussions like meeting and phone conversations, theconversation between the speakers is synchronized. However, in written conversa-tions such as email and forum discussions, authors participate in the discussion indifferent orders. In chapter 2, we are going to explain this difference with moredetails.Revealing the underlying conversational structure in dialogues is important fordetecting the human social intentions in spoken conversations and in many applica-tions including summarization [29, 30], dialogue systems and dialogue games [9,25, 43] and flirt detection [32]. As an additional example, Ravi and Kim [33] showthat dialogue acts can be used for analyzing the interaction of students in educa-tional forums.Recently, there have been increasing interests for dialogue act recognition inspoken and written conversations, which include meetings, phone conversations,emails and blogs. However, most of the previous works are specific to one of thesedomains. There are potentially useful features and algorithms for each of these do-mains, but due to the underlying similarities between these types of conversations,we aim to identify a domain-independent dialogue act modeling approach that canachieve good results across all types of conversations. Such a domain-independentdialogue act recognizer makes it possible to automatically recognize dialogue actsin a wide variety of conversational data, as well as in conversations spanning mul-1Chapter 1. Introductiontiple domains/modalities; for instance a conversation that starts in a meeting andthen continues via email.The popularity of World Wide Web, along with the accuracy improvement ofspeech-to-text systems has led to increased conversational texts [6]. Although thislarge-scale data does not contain any form of annotation, it has a good potentialfor learning, when combined with relatively small amounts of labeled data. Basedon this observation, in this thesis we explore supervised dialogue act modelingapproaches along with unsupervised and semi-supervised methods.While previous work in dialogue act modeling has focused on studying onlyone [10, 13, 22, 33, 34, 37, 39, 49] or, in a few cases, a couple of conversationaldomains [15, 18, 31], in this thesis, we analyze the performance of dialogue actmodeling techniques on a comprehensive set of different spoken and written con-versations that includes: emails, forums, meetings, and phone conversations. Morespecifically, we explore the performance of two sophisticated unsupervised meth-ods of Hidden Markov Model (HMM) and Mixed Membership Markov Model(M4) for dialogue act recognition. In addition, we compare the performance ofthree state-of-the-art supervised machine learning algorithms, which include Sup-port Vector Machines-multiclassifier (SVM-multiclass) and two structured predic-tors Support Vector Machines-Hidden Markove Model (SVM-hmm) and Condi-tional Random Fields (CRF) for dialogue act modeling. In the next step, we usethe best learner to investigate the effectiveness of using unlabeled data in additionto labeled data via semi-supervised dialogue act recognition on these domains. Wepresent an extensive set of experiments studying the effectiveness of dialogue actmodeling on different types of conversations such as emails, forums, meeting, andphone discussions. The experimental results show the low performance of unsu-pervised methods for the task of dialogue act modeling. Besides, the SVM-hmmalgorithm outperforms other supervised algorithms across all datasets.The following chapter provides the background in the field of dialogue actmodeling and reviews prior work on dialogue act recognition. Chapter 3 describesthe corpora used in our experiments and explains the significance of conversationalstructure for dialogue act modeling. Chapter 4 introduces our domain-independentfeature set and illustrates the applied unsupervised, supervised and semi-supervisedalgorithms. Chapter 5 presents all the experiments and results of unsupervised, su-pervised and semi-supervised dialogue act modeling methods, and analyzes theperformance of the algorithms on spoken and written conversations. Finally in thelast chapter we conclude by recapping the achievements of our work and mention-ing possible future works.2Chapter 2Background and Related Work2.1 Background2.1.1 Dialogue ActsConversation between humans is a joint activity between two or more people. Inthis activity, we always have a speaker and a listener. Conversation flows with theinference of the listener about the speaker?s speech and taking the turn to state hisown ideas. [20]Austin[3] considers utterances as actions which are performed by the speaker.The verbs that specify actions are called "performative verbs" such as I name thisship a Titanic. These types of actions are called "speech acts". However, speechacts are not confined to these types of verbs. Searle[35] proposes five classes ofspeech acts:? Representative: committing to the speaker to something?s being the case(suggesting, putting forward, swearing, boasting, concluding)? Directives: speaker attempts to get something done by the addressee (asking,ordering, requesting, inviting, advising, begging)? Commissives: committing the speaker to a future action (promising, plan-ning, vowing, betting, opposing)? Expressives: expressing a psychological state (thanking, apologizing, wel-coming, deploring)? Declarations: declare a state of affairs exists (e.g. I now pronounce youhusband and wife)A conversation is a joint activity between the speaker and listener, and it con-sists of dependent speech acts. The speaker presents something by performing onespeech act, and the listener "grounds" the speaker confirming that he has under-stood the speaker?s intended meaning. [20]32.1. Background'DWH)UL$SU)URP/LEE\0LOOHU7RSXEOLFHVZ#ZRUJ6XEMHFW6:$'(XURSHSRVWFDUGSLFWXUHKLDOO6$VRXWOLQHGDWWKHIDFHWRIDFHPHHW,YHEHHQZRUNLQJRQPHWKRGVRIJHWWLQJFRRUGLQDWHLQIRUPDWLRQDERXWWKHORFDWLRQVRI6HPDQWLFZHEUHVHDUFKHUVDQGJURXSV6,ZURWHWRWKHZZZUGILQWHUHVWOLVWDVNLQJIRUGDWD6VRZHOOKDYHWRVHHKRZWKDWSDQVRXW63OHDVHGRDGG\RXUVHOYHVLI\RXYHQRWDOUHDG\WKHUH,MXVWGLGDIHZDVH[DPSOHV$&7KHUHDUHVRPH69*GHPRVGHYHORSHGIURPRQHVE\-LP/H\GLVSOD\LQJWKHLQIRUPDWLRQZHKDYHDOUHDG\ZKLFK,FRQVWUXFWHGDVH[DPSOHV66RQRZZHKDYHWRGHFLGHRQWKHPDSZKLFKZHZLOOXVHWRRYHUOD\WKHGRWVRQ$&:HZDQWVRPHWKLQJWKDWZLOOKLJKOLJKW(XURSHHVSHFLDOO\VLQFHRQDZRUOGPDS(XURSHLVWRRVPDOOWRUHDOO\VKRZXSRQSURYLGHWKHOHYHORIGHWDLOZHUHTXLUH6$W,/57&DUROLQH0HHN'DQDQG,KDYHKDGGLVFXVVLRQVZLWKDGHVLJQHUDQGKHKDVSRLQWHGXVDWVRPHH[DPSOHVDQGZHYHQDUURZHGLWGRZQWRWZRWKDWZHUDWKHUOLNH6ZKLFKKDVWKHDGYDQWDJHRIVKRZLQJ(XURSHDVSDUWRIWKHZRUOGZHFRXOGJHWULGRIWKHDUPRIWKHJOREHWKDWLVVKRZLQJDQGDOWHUWKHFRORXUV6ZKLFKZRXOGEHWHFKQLFDOO\VLPSOHUWRGREXWVKRZV(XURSHLQLVRODWLRQ,GROLNHWKHFRORXUVFKHPHWKRXJK6'RSHRSOHKDYHDSUHIHUHQFHIRURQHVW\OHRYHUWKHRWKHU"4<&RXOG\RXOHWXVNQRZWRGD\LISRVVLEOH"4<:HQHHGWRVSHFLI\WKHPDSDVVRRQDVSRVVLEOHLQRUGHUWRJHWSRVWFDUGVSULQWHGIRUZZZ$&WKDQNV3/LEE\6SV+HUHVDQLFHVLPLODULGHD6'DWH)UL$SU)URP'LFNLQVRQ,DQ-7R/LEE\0LOOHUSXEOLFHVZ#ZRUJ6XEMHFW5(6:$'(XURSHSRVWFDUGSLFWXUH!'RSHRSOHKDYHDSUHIHUHQFHIRURQHVW\OHRYHUWKHRWKHU"!&RXOG\RXOHWXVNQRZWRGD\LISRVVLEOH"!:HQHHGWRVSHFLI\WKHPDSDVVRRQDVSRVVLEOHLQRUGHUWRJHWSRVWFDUGVSULQWHGIRUZZZ&DQWVD\,PSDUWLFXODUO\JUDEEHGE\HLWKHURIWKHPEXWLI\RXKDYHWRSLFNRQH,GJRIRUWKHPDSQRWWKHJOREH6,VWKHUHDQ\UHDVRQLWKDVWREHJHRJUDSKLFDOO\DFFXUDWH"4<3HUKDSVDVFKHPDWLFDQDORJRXVWRWKH/RQGRQ8QGHUJURXQGPDSRI/RQGRQZRXOGDOORZ\RXWRFRQYH\WKHVHQVHRIVSDWLDOGLVWULEXWLRQZLWKRXWJHWWLQJKXQJXSRQWKHVFDOHSUREOHP62IFRXUVH\RXPD\QRWFRQVLGHUWKDWDKHOSIXOVXJJHVWLRQJLYHQWKHVKRUWWLPHVFDOH6&KHHUV3,DQ6'DWH)UL$SU)URP'DQ%ULFNOH\7R'LFNLQVRQ,DQ-6XEMHFW5H6:$'(XURSHSRVWFDUGSLFWXUH!!'RSHRSOHKDYHDSUHIHUHQFHIRURQHVW\OHRYHUWKHRWKHU"!!&RXOG\RXOHWXVNQRZWRGD\LISRVVLEOH"!!:HQHHGWRVSHFLI\WKHPDSDVVRRQDVSRVVLEOHLQRUGHUWRJHWSRVWFDUGVSULQWHGIRUZZZ!&DQWVD\,PSDUWLFXODUO\JUDEEHGE\HLWKHURIWKHPEXWLI\RXKDYHWRSLFNRQH,GJRIRUWKHPDSQRWWKHJOREH!,VWKHUHDQ\UHDVRQLWKDVWREHJHRJUDSKLFDOO\DFFXUDWH"!3HUKDSVDVFKHPDWLFDQDORJRXVWRWKH/RQGRQ8QGHUJURXQGPDSRI/RQGRQZRXOGDOORZ\RXWRFRQYH\WKHVHQVHRIVSDWLDOGLVWULEXWLRQZLWKRXWJHWWLQJKXQJXSRQWKHVFDOHSUREOHP!2IFRXUVH\RXPD\QRWFRQVLGHUWKDWDKHOSIXOVXJJHVWLRQJLYHQWKHVKRUWWLPHVFDOH2QHDGYDQWDJHRIEHLQJJHRJUDSKLFDOO\DFFXUDWHLVLWVLPSOLILHVRXUOLYHVZKHQZHWU\WRSORW6:UHVHDUFKHUVJURXSVHYHQWVHWFRQWKHPDS6,SUHIHUWKHPDSWRWKHJOREHIZLZ6'DQ6Figure 2.1: Sample email thread from BC3.Dialogue acts are a combination of speech acts and grounding, which wouldresult in a more sophisticated model. Traum and Hinkelman[41] propose four typesof dialogue acts:? Turn-taking (take-turn, keep-turn, release-turn, assign-turn)? Core speech acts (inform, accept, wh-question, request, offer, opinion)? Ground (continue, acknowledge, repair)42.1. Background? Argumentation (elaborate, summarize, question-answer, clarify, agree withpolarity level, disagree with polarity level)The two middle levels of core speech acts and ground corresponds to DAMSLtagset. Figure 2.1 shows a sample email thread from BC3, a corpus containingemail conversations (See Chapter 3 for details). We can see that the conversations isa joint action between the participants. They read the previous emails, understandthe authors? intention and meaning, and grounds the speakers. They participate inthe thread by stating different dialogue acts (e.g., S as statements, QY as Yes-noquestion).52.1. Background2.1.2 Synchronous vs. Asynchronous ConversationsConversations have a generative process. They start with common expressionssuch as "Hi", identifying the speakers, and bringing a topic. From this step, speak-ers continue to participate in the discussion by stating dialogue acts, turn takingand grounding. However, the sequential nature of the conversations may differ inspoken and written discussions.4:6SHDNHU$ :KDWNLQGK4:6SHDNHU% :KDWNLQGRIFDUVGR\RXGR\RXXVXDOO\GULYH"66SHDNHU$ 5LJKWQRZZH,YHJRWDXKIRXU\HDUROG6XEDUXDQGDWZR\HDUROG+RQGD$$6SHDNHU% 2KP\JRRGQHVV66SHDNHU% :HOOZHUHZHUHZHUHVRUWRIVRUWRID*HQHUDO0RWRUVIDPLO\66SHDNHU% $QGXKXKZHZHUHRYHUVHDVZLWK7,IRUDQXPEHURI\HDUV66SHDNHU% DQGZKHQZHFDPHEDFNZHVXGGHQO\ILQGRXWZHYHJRWILYHOLFHQVHGGULYHUVLQWKHIDPLO\%6SHDNHU$ 2KJRRGQHVV%6SHDNHU% <HDKODXJKWHU!66SHDNHU% DQGIRXUFDUV66SHDNHU% DQGZHERXJKWVRPH\HDUROGFDUVIURP1DWLRQDO&DU5HQWDO%6SHDNHU$ 5HDOO\"66SHDNHU% $WOHDVWWZRRIWKHP66SHDNHU% :HERXJKWDQ2OGV&DODLVDQGD%XLFN6N\ODUN%6SHDNHU% 8KIRU\RXNQRZ66SHDNHU% WKH\HDFKKDGWZHQW\RQHWKRXVDQGPLOHVRUVRPHWKLQJOLNHWKDW%6SHDNHU$ 8KKXK66SHDNHU% 7XUQVRXWWKDWVDSUHWW\JRRGSODFHWREX\FDUVEHFDXVHWKH\PDLQWDLQWKHP66SHDNHU% DQGWKH\FDQVKRZ\RXWKHUHFRUGVWKH\YHJRWWHQDOOWKHLUPDLQWHQDQFHRQVFKHGXOHDQGHYHU\WKLQJOLNHWKDW66SHDNHU% 8KWKHQZHYHJRWDQHLJKW\VL[&KHY\6SHFWUXPWKDWZHERXJKWQHZ66SHDNHU% 0\GDXJKWHUZDVDIUHVKPDQLQFROOHJHZKHQZHERXJKWLWIRUKHU%6SHDNHU$ 8KKXK66SHDNHU% $QGZHYHJRWDHLJKW\IRXU7R\RWDWKDWZHERXJKWXVHG66SHDNHU% 6RZHYHJRWIRXUFDUVIRURXUILYHOLFHQVHGGULYHUV66SHDNHU% ,WVOLNHPXVLFDOFKDLUVODXJKWHU!$$6SHDNHU$ ,EHW66SHDNHU% <RXUHRQHVKRUW$$6SHDNHU$ ,EHW%6SHDNHU% $QGXK66SHDNHU% EXW\RXNQRZLWVLQWHUHVWLQJWKHWZR66SHDNHU% HYHQWKRXJKP\GDGZDVD&KHYUROHWPDQIRUHYHUDQGHYHUDQGHYHU66SHDNHU% DQGZHZHUHLQWKHUHWDLOPLONEXVLQHVV66SHDNHU% DQGKHKDGDOO&KHYUROHWPLONWUXFNV66SHDNHU% DQGLWZDVDOPRVWOLNHWKHPLQXWHZHJRWROGDQGERXJKWDQ\WKLQJOLNHDEXWD&KHYUROHWLWZRXOGEHOLNHJRLQJDQGVSLWWLQJRQKLVJUDYH%6SHDNHU% %XWXK%6SHDNHU$ <HS66SHDNHU% DVZHJRWROGHUDQGJRWDZD\66SHDNHU% DQGWKHQZHWKRXJKWZHOO\RXNQRZ\RXVWLOOFRXOGEX\*HQHUDO0RWRUVSURGXFWV66SHDNHU% DQGLWZDVSUREDEO\QRWWKDWEDG66SHDNHU% %XWXKZHOLYHGLQ0DOD\VLDIRU7,LQQLQHWHHQXKHLJKW\RQHWZRWKUHHDQGIRXU%6SHDNHU$ 8KKXK66SHDNHU% $QGDQGRIFRXUVH\RXGRQWEX\*HQHUDO0RWRUVSURGXFWVRYHUWKHUHODXJKWHU!$6SHDNHU$ 1R66SHDNHU% %XWZHERXJKWD9ROYR66SHDNHU% DQG,KDYHQHYHUKDGDFDUWKDW,YHORYHGPRUH66SHDNHU% 8KEXW,,MXVWORYHGWKDWFDU%6SHDNHU$ 8KKXK66SHDNHU% ZHOLYHGLQWKH3KLOLSSLQHVIRU7,LQHLJKW\ILYHHLJKW\VL[DQGHLJKW\VHYHQ66SHDNHU% DQGZHERXJKWDXK)RUG7HOVWDU4<6SHDNHU% $QGKDYH\RXKDGH[SHULHQFHZLWKFDUVRWKHUWKDQWKHRQHV\RXKDYH"$6SHDNHU$ %UHDWKLQJ!:HOO\HDKFigure 2.2: Sample phone conversation from SWBD.In spoken conversations, the discussion between the speakers is synchronized.62.1. BackgroundThe speakers hear each other?s ideas and then state their opinions. So the temporalorder of the utterances can be considered as the conversational structure in thesetypes of conversations. Figure 2.2 shows a sample telephone conversation fromSWBD, a corpus containing phone conversations. The speakers participate in theconversation with a correct turn shifting. One speaker asks a question and thelistener answers his question in the next dialogue act.However, in written conversations such as email and forum messages, authorscontribute to the discussion in different order, and sometimes they do not pay at-tention to the content of previous posts. Therefore, the temporal order of the con-versation cannot be used as the conversational structure in these domains, and weshould employ some other techniques to extract the underlying structure in theseconversations. Figure 2.3 is a sample email conversation from BC3. It is not ob-vious whether the third email is an answer to the first email or the second one.Although, it was replied to the second email, it is actually an answer to the initialemail. So we cannot consider the temporal order of the emails as the conversationalstructure. In this thesis, we will extract the underlying structure of asynchronousconversations by applying an appropriate technique.72.1. Background'DWH7XH0DU)URP*UDKDP2OLYHU7RZFZDLLJ#ZRUJ6XEMHFW7H[WXQDYDLODEOHWRVLJKWHGXVHUVEXWDYDLODEOHWRVFUHHQUHDGHUXVHUV,KDYHEXLOWDIRUPZKLFKKDVDVXEPLWEXWWRQRQLW6,IWKHXVHUDFWLYDWHVWKHEXWWRQDQGKDVQWILOOHGRXWWKHIRUPFRUUHFWO\DMDYDVFULSWDOHUWSRSXSLVSUHVHQWHGWRWKHXVHUJLYLQJPRUHLQIRUPDWLRQ67KLVDSSHDUVWREHLQEUHDFKRIFKHFNSRLQW6:RXOGLWEHDFFHSWDEOHWRSURYLGHDZDUQLQJPHVVDJHVRPHWKLQJOLNH6XEPLWWLQJWKLVIRUPPD\UHVXOWLQDSRSXSGLDORJEHLQJGLVSOD\HGEHIRUHWKHVXEPLWEXWWRQLQWKHVDPHFRORUDVWKHEDFNJURXQG"4<7KXV,EHOLHYHPDNLQJWKHWH[WDYDLODEOHWRVFUHHQUHDGHUXVHUVEXWQRWWRVLJKWHGXVHUV"4<5HJDUGV3*UDKDP2OLYHU6'DWH:HG0DU)URP'DYLG:RROOH\7RZFZDLLJ#ZRUJ6XEMHFW5H7H[WXQDYDLODEOHWRVLJKWHGXVHUVEXWDYDLODEOHWRVFUHHQUHDGHUXVHUV!,KDYHEXLOWDIRUPZKLFKKDVDVXEPLWEXWWRQRQLW!,IWKHXVHUDFWLYDWHVWKHEXWWRQDQGKDVQWILOOHGRXWWKHIRUPFRUUHFWO\DMDYDVFULSWDOHUWSRSXSLVSUHVHQWHGWRWKHXVHUJLYLQJPRUHLQIRUPDWLRQ!7KLVDSSHDUVWREHLQEUHDFKRIFKHFNSRLQW!:RXOGLWEHDFFHSWDEOHWRSURYLGHDZDUQLQJPHVVDJHVRPHWKLQJOLNH6XEPLWWLQJWKLVIRUPPD\UHVXOWLQDSRSXSGLDORJEHLQJGLVSOD\HGEHIRUHWKHVXEPLWEXWWRQLQWKHVDPHFRORUDVWKHEDFNJURXQG"!7KXV,EHOLHYHPDNLQJWKHWH[WDYDLODEOHWRVFUHHQUHDGHUXVHUVEXWQRWWRVLJKWHGXVHUV"7KLVZLOOVKRZXSRQDQ\GHYLFHWKDWFDQQRWGLVSOD\FRORXUVRUGRHVQWXQGHUVWDQGWKHVW\OLQJXVHGWRVHWWKHFRORXUV6,WLVDNQRZQWULFNIRUVWXIILQJNH\ZRUGVIRUVHDUFKHQJLQHVDQGPD\UHVXOWLQWKHVLWHEHLQJER\FRWWHGE\PDMRUVHDUFKHQJLQHV62QHLVH[SHFWLQJDPDMRUFKDQJHLQFRQWHQWZKHQRQHVXEPLWVDIRUPVRRQHRIWKHPDLQUHDVRQVIRUDYRLGLQJSRSXSVGRHVQWDSSO\6'DWH7KX0DU)URP&KDUOHV0F&DWKLH1HYLOH7R'DYLG:RROOH\6XEMHFW5H7H[WXQDYDLODEOHWRVLJKWHGXVHUVEXWDYDLODEOHWRVFUHHQUHDGHUXVHUV,WDOVRGHSHQGVRQKRZWKHVFUHHQUHDGHUJHWVLWVFRQWHQW6UHDGHUVWKDWDUHWRRVPDUWDERXWFRORXUVDXVHIXOIHDWXUHZLOOUHDOLVHWKDWWKHWH[WLVLQYLVLEOHDQGVNLSLW6JLYHQWKHZD\WKLVLVQRUPDOO\XVHGWRVWXIINH\ZRUGVLQSDJHVLWLVQRWDEDGSUDFWLFH6&KDUOHV0F&16!!,KDYHEXLOWDIRUPZKLFKKDVDVXEPLWEXWWRQRQLW!!,IWKHXVHUDFWLYDWHVWKHEXWWRQDQGKDVQWILOOHGRXWWKHIRUPFRUUHFWO\DMDYDVFULSWDOHUWSRSXSLVSUHVHQWHGWRWKHXVHUJLYLQJPRUHLQIRUPDWLRQ!!7KLVDSSHDUVWREHLQEUHDFKRIFKHFNSRLQW!!:RXOGLWEHDFFHSWDEOHWRSURYLGHDZDUQLQJPHVVDJHVRPHWKLQJOLNH6XEPLWWLQJWKLVIRUPPD\UHVXOWLQDSRSXSGLDORJEHLQJGLVSOD\HGEHIRUHWKHVXEPLWEXWWRQLQWKHVDPHFRORUDVWKHEDFNJURXQG"!!7KXV,EHOLHYHPDNLQJWKHWH[WDYDLODEOHWRVFUHHQUHDGHUXVHUVEXWQRWWRVLJKWHGXVHUV"!7KLVZLOOVKRZXSRQDQ\GHYLFHWKDWFDQQRWGLVSOD\FRORXUVRUGRHVQWXQGHUVWDQGWKHVW\OLQJXVHGWRVHWWKHFRORXUV!,WLVDNQRZQWULFNIRUVWXIILQJNH\ZRUGVIRUVHDUFKHQJLQHVDQGPD\UHVXOWLQWKHVLWHEHLQJER\FRWWHGE\PDMRUVHDUFKHQJLQHV!2QHLVH[SHFWLQJDPDMRUFKDQJHLQFRQWHQWZKHQRQHVXEPLWVDIRUPVRRQHRIWKHPDLQUHDVRQVIRUDYRLGLQJSRSXSVGRHVQWDSSO\Figure 2.3: Sample email conversation from BC3.82.2. Related Work2.2 Related WorkAll the previous studies on dialogue act modeling focus on only in a specific do-main, and do not systematically analyze the effect of dialogue act modeling ap-proaches on a comprehensive set of conversation domains. As far as we know, thepresent work is the first that proposes domain-independent supervised and semi-supervised dialogue act modeling techniques, and analyzes the effectiveness ofdifferent dialogue act modeling methods on different modalities of conversations.2.2.1 Unsupervised Dialogue Act RecognitionAmong recent works on unsupervised dialogue act modeling, Ritter et al.[34] use aconversation model based on HMM for dialogue act modeling of twitter conversa-tions. For examining the performance of the model, they consider different numberof acts between 5 and 40, and use Expectation Maximization (EM) with randominitialization for training the model. The results show that this model has an incli-nation to cluster the posts into their topical groups. To overcome the problem oftopical clusters, they extend the Latent Dirichlet Allocation (LDA) framework toabstract away from topic words. In this model, each word is generated from one ofthe following sources: the current post?s dialogue act, the conversation?s topic, andthe General English. They use Gibbs sampling for the inference in this model.The models are evaluated in two ways of qualitative and quantitative. In thequalitative evaluation, they visualize the 10-act conversation topic model (extendedLDA model) with the top 40 words of each 10 cluster and the transition probabil-ities between the acts. They also evaluate the models in a quantitative way byusing the trained models to predict the sequential structure of the conversations.Overall, Bayesian conversation model outperforms the EM conversation model ina conversation ordering task.Joty et al.[18] study the problem of unsupervised dialogue act modeling inasynchronous conversations like emails and forums. They solve this problem intwo ways of graph-theoretic framework and probabilistic conversation model. Inthe preprocessing step, they try to capture the conversational structure in emailthreads. To this aim, Fragment Quotation Graph (FQG)[7] is extracted for eachthread. In this process, the new and quoted fragments in each email are detected,which would be the nodes of the FQGs, and an edge is considered between anynew fragment and its neighborhood quoted fragments. Whereas for extracting theconversational structure in forum conversations, each post is considered to be areply to the initial post as a default, unless it mentions other participants? names.In the first approach, they attack the dialogue act modeling problem by a graphpartitioning method. A similarity graph is built based on the similarity between92.2. Related Worksentences, and by adopting N-mincut graph partitioning method, they try to clusterthe sentences into their dialogue acts. Different similarity metrics that are consid-ered are as follows:? Cosine of the angle in between TFIDF Bag of Words (BOW) vectors? Masking nouns to abstract away the topic words in the previous method? Word Subsequence Kernel? Extended String Subsequence Kernel with Part of Speech (POS)-tags ofwords? Dependency similarity? Syntactic similarityHowever, none of these methods could beat the baseline, which is considered asthe majority (statement sentences).In the second approach, they employ HMM conversational model by consider-ing unigrams, length, speaker and position as the features. The results show thatthis model suffers from topical clusters, and has an inclination to cluster sentencesbased on their topics not dialogue acts. To handle the problem of topical clustersin HMM, they define the emission distribution as a mixture model (HMM+Mix).Expectation maximization with multiple restarts are used for learning the model,and for predicting the dialogue act sequence Viterbi algorithm is employed. Theresults show that Hidden Markov Model-Mixture Model (HMM+Mix) performsbetter than HMM on both email and forum datasets.Paul[31] investigates the problem of unsupervised dialogue act modeling ontwo asynchronous conversations of forums and twitter posts. The intuition of theapproach is to combine the useful factors of HMM and LDA for the task of dialogueact modeling.In LDA, each message can be generated by a mixture of classes, whereas it can-not capture the sequential conversational structure as in HMM. They incorporatethe conversational flow into the LDA model by defining a message?s class distribu-tion to depend on the class assignment of the previous message. They adopt MonteCarlo EM for the inference, and alternate between one iteration of Gibbs samplingand one iteration of gradient ascent for optimizing the transition parameters. Bycomparing the perplexity on the test data, M4 (Mixed Membership Markov model)performs significantly better than HMM, but LDA outperforms both of them. Paulargues that the reason for better performance of LDA is because it does not considerthe previous block for generating the class distributions and its parameters fit the102.2. Related Workdata better. In another evaluation method for thread reconstruction, M4 performsbetter than HMM on both twitter and forum corpora.SummaryRitter et al.[34]Type TwitterCorpus TwitterLevel PostTagset 0Features unigramsEvaluation Conversation ordering - visualization of thetransition graphLearner Latent Dirichlet Allocation - each word isgenerated from one of the following sources:current post?s dialogue act, conversation?stopic, General English - Gibbs sampling andExpectation MaximizationJoty et al.[18]Type Email - ForumCorpus W3C - BC3 - TripAdvisorLevel SentenceTagset 0 - 12 tags - 0Features unigrams - speaker - length - positionEvaluation Accuracy of clustering (one-to-one mapping)Learner graph-theoretic framework - HMM andHMM+Mix modelPaul[31]Type Forum - TwitterCorpus CNET - TwitterLevel PostTagset 12 tags - 0Features unigramsEvaluation Conversation ordering - visualization ofthe transition graph - perplexity - cluster-ing(variation of information metric)Learner M4(Combination of HMM and LDA)Table 2.1: Unsupervised dialogue act recognition related works.Although, there have been several studies on unsupervised dialogue act model-ing, none of them has investigated dialogue act recognition on different conversa-112.2. Related Worktional modalities. In this thesis, we employ unsupervised techniques for dialogueact modeling on different conversational domains. The features that we considerfor unsupervised dialogue act modeling are the unigrams, similar to [18, 31, 34].Furthermore, like the previous works[18, 31], we consider the dialogue act depen-dencies in our unsupervised methods.2.2.2 Supervised Dialogue Act RecognitionThere have been several studies on supervised dialogue act modeling. To the best ofour knowledge, none of them compare the performance of dialogue act recognitionon different synchronous (e.g., meeting and phone) and asynchronous (e.g., emailand forum) conversations. Most of the works analyze dialogue act modeling in aspecific domain.Carvalho and Cohen[10] propose classifying emails into their dialogue actsaccording to two ontologies for nouns and verbs. Email messages are classified toseveral verbs such as request, propose, deliver, commit, and two aggregations ofverbs such as a set of commissive acts (deliver and commit), and a set of directiveacts (request, propose and amend). They are also classified based on two nouns ofmeeting and data. They investigate the problem of classifying email messages totheir speech acts by employing a collective classification method. In the first step,eight maximum entropy classifiers are trained from the training set; each of them istrained for one speech act with features including the words of the email body, thewords in the email subject, and the relational features (16 boolean features whichindicate email act of the parent and the child of the current message).In the inference process, the test set is initialized with the predicted email actsof content-only classifiers. Then it iterates over the following procedure: for eachmessage, the prediction confidence of each of the eight email acts are computedusing the eight classifiers and if it is higher than a threshold, the act would be up-dated. During the process, this threshold would decrease linearly with the iterationnumber. They show that the performance is improved for some email acts such ascommissive, meet and commit. On the other hand, there is not any statistical sig-nificant improvement for deliver and data email acts because they are very frequentafter any speech acts.Shrestha and McKeown [37] also study the problem of dialogue act model-ing in email conversations considering two dialogue acts of question and answer.Their approach is divided into two main steps of automatic question detection andautomatic answer detection. For detecting questions, they use part-of-speech tagsof the first and last five terms of the utterance, length of the utterance and the 100most discriminative POS-bigrams. They use Ripper for learning the rules in thistask. In the next step, they try to detect the answer of each question in a segment122.2. Related Worklevel. For representing the feature vector, they use some standard features (e.g. co-sine similarity, and Euclidean distance), thread structure features and some featuresbased on the other candidate answer segments.The results show that this approach works well for detecting the interrogativequestions but not declarative or rhetorical questions, which are mostly important inemail conversations since people usually try to be polite in their emails and requestin a declarative form.Ravi and Kin[33] present a dialogue act recognition method for detecting ques-tions and answers in educational discussions. They annotate a discussion board ofan undergraduate computer science course for this task. The annotation schemacontains several speech acts of question, elaboration, information, answer, correc-tion, and acknowledgement. In the preprocessing step, some categories of terms arereplaced with their corresponding category name such as technical terms, commonwords, word sequences, informal words or typographical symbols. In the learn-ing phase, two separate SVM classifiers are trained for detecting questions andanswers. The classifiers? features are the unigrams, bigrams, trigrams and quadro-grams. Moreover, the feature space is also reduced by ranking features based ontheir Information Gain (IG) values.Ferschke et al.[13] apply dialogue act modeling to Wikipedia discussions toanalyze the collaborative process of editing Wikipedia pages. They propose an an-notation schema with 17 dialogue acts for their Wikipedia corpus, which contains100 Talk pages of English Wikipedia. For each label, they train Na?ve Bayes, J48(a decision tree algorithm) and SMO (an optimization algorithm for SVM), andselect and insert the best learner into the classification pipeline. The features forthe learners are all the unigrams, bigrams and trigrams that occurred in at leastthree distinct turns, temporal distance to the previous and next turns, the length ofthe current, previous and next turns, the position of the turn in the discussion andthe indentation level of the turn. In order to capture the conversational structure,they also add the n-grams of the previous and the next turn as features. They alsoreduced the number of features using X2 metric (CHI) and Information Gain (IG)values.Overall, the best non-lexical features prove to be the indentation level of theturn, the temporal distance to the previous turn, and the turn position within thetopic. Furthermore, by comparing the final result with the human annotation agree-ments, in several labels the classification outperforms the human performance. Wecan conclude that the sentences of these labels have common features.Kim et al.[22] study the task of supervised classification of dialogue acts inone-to-one online chats in the shopping domain. They use three groups of features,including n-gram features (unigrams, bigrams and trigrams), structural information(author, relative position in the chat, and turn-relative position among utterances in132.2. Related Worka turn), and utterance dependency (dialogue act of previous utterance, accumu-lated dialogue acts of previous utterances, and dialogue act of previous utteranceby the same author). They use bag-of-words features by different representationsof binary, Term Frequency (TF), Term Frequency Inverse Document Frequency(TFIDF) and Information Gain with raw and lemma tokens. In the learning phase,they employ three machine learners of SVM-HMM, CRF and Na?ve bayes to clas-sify utterances into 12 speech acts.Among the three classifiers, CRF outperforms the other learners because it con-siders the sequential dependency between utterances. Moreover, the best featuresfor classifying dialogue acts in this media prove to be unigrams (with binary rep-resentation), author, relative position in the chat, dialogue act of previous utteranceand dialogue act of previous utterance from the same author.Cohen et al.[11] propose classifying emails into their speech acts based on theintention of the author. By considering each email to have several possible speechacts, they assume two ontologies for nouns and verbs, which are used for determin-ing the speech acts of each single email with verb-noun pairs. These ontologies aredesigned based on a large email corpus, which is in the office domain. The verbontology contains request, propose, amend, commit and deliver as verb tags, andthe noun ontology includes two main categories of information and activities. Theyinvestigate the effectiveness of different features of unigrams with TFIDF weight-ing, bigrams, time expressions (e.g. ?before?), part of speech tags, and the adjacentwords to the pronouns or proper nouns. Among these features, bigram show its ef-fectiveness for dialogue act modeling in this media, whereas TF*ID show that it isnot an appropriate weighting for unigrams. The other features perform differentlyfor each speech act.They also study the effectiveness of different machine learning algorithms fordialogue act modeling on their corpus. They represent the feature vector as theunweighted word frequency counts and use Voted perceptron algorithm, DecisionTree (DT), AdaBoost learner (DT as the weak learners), and SVM with linear ker-nel for learning on each speech act. However, none of these large-margin classifierscould perform well on all the speech acts.Sun and Morency [39] propose a different approach for recognizing dialogueacts in multiparty meetings. They apply a reweighted domain adaptation tech-nique [14] to dialogue act recognition in order to balance the effect of speaker-specific and other speakers? data. The critical assumption about the data in speakeradaptation is that it should be more similar to the data from the same speaker thananother speaker. They investigate this assumption on their dataset, ICSI-MRDAcorpus and observe three major differences between distinct groups of speaker-specific data: Conflicts (same words are used by different speakers but with con-trary intentions), different word distributions and different label distributions. They142.2. Related Workemploy Maximum Entropy model with unigrams, bigrams and trigrams features.The results show that applying the reweighted domain adaptation in order toadapt a speaker specific dialogue act recognizer not only achieves promising re-sults but also obviates the need for using more speaker-specific data in speakeradaptation approaches.SummaryCarvalho and Cohen[10]Type EmailCorpus CSpaceLevel EmailTagset Noun and Verb TaxonomiesFeatures words of the email body, email sub-ject, relational features (16 booleanfeatures)Learner binary MaxEnt and collective classifi-cation (not good for deliver and dataacts)Shrestha and McKeown[37]Type Phone - EmailCorpus SWBD (train)- ACM corpus (test)Level SegmentTagset DAMSL - 2 tags (question and an-swer)Features Question: POS of the first andlast five terms, length, 100 POS-bigrams. Answer:standard fea-tures (cosine similarity, Euclideandistance), thread structure(distances),features based on other candidate an-swers(distances).Learner Ripper (good for interrogative ques-tions but not the declarative or rhetor-ical questions)152.2. Related WorkSummaryRavi and Kim[33]Type ForumCorpus Educational ForumLevel MessageTagset 2 tags (question and answer)Features 1-2-3-4grams feature selection (Infor-mation Gain value) - preprocessing:replaced words with categories.Learner Two SVM (question and answer)Ferschke et al.[13]Type ChatCorpus Wikipedia Talk pagesLevel TurnTagset 17 tagsFeatures 1-2-3grams in at least 3 distinct turns,temporal distance to the previous andnext turns, length of the current, pre-vious and next turns, position of theturn in the discussion, indentationlevel of the turn, n-grams of the pre-vious and the next turn. featureselection(X2 metric and IG)Learner binary classification (Na?ve Bayes,J48 and SMO)Kim et al.[22]Type ChatCorpus chat in shopping domainLevel UtteranceTagset 12 tagsFeatures 1-2-3grams, author, relative positionin chat, turn-relative position, dia-logue act of previous utterance, accu-mulated dialogue acts of previous ut-terances, dialogue act of previous ut-terance by the same author.Learner CRF, Na?ve Bayes, SVM-HMM162.2. Related WorkSummaryCohen et al.[11]Type EmailCorpus CSpace - PW CALOLevel EmailTagset Noun and Verb TaxonomiesFeatures unigram W/O TFIDF weighting, bi-grams, time expressions, part ofspeech tags, adjacent words to pro-nouns or proper nounsLearner Voted perceptron algorithm, decisiontree (DT), AdaBoost learner (DT asthe weak learners), and SVMwith lin-ear kernel.Sun and Morency [39]Type MeetingCorpus MRDALevel UtteranceTagset MRDA tagsetFeatures 1-2-3gramLearner MaxEnt (Speaker adaptation)Table 2.2: Supervised dialogue act recognition related works.There have been several works on supervised dialogue act modeling. However,as far as we know, this thesis is the first study that investigates the performanceof domain-independent supervised dialogue act modeling techniques on differentconversational domains. In addition, most of the previous works have not consid-ered the dialogue act dependencies in the conversations. We are taking into accountthese dependencies by using structured predictors for labeling the sentences. Thefeatures used in this work are a set of domain-independent features that showedtheir effectiveness in the previous supervised dialogue act modeling studies.2.2.3 Semi-supervised Dialogue Act RecognitionMore recently, researchers have studied dialogue act recognition in semi-supervisedmanner. Jeong et al. [15] propose a semi-supervised dialogue act modeling ap-proach. The method exploits meeting and phone conversations to extract the bestfeatures for dialogue act classification in email and forum domains. Each sentenceof the phone and meeting conversations is represented as a set of trees (i.e., part172.2. Related Workof speech tree, n-gram tree and dependency tree). Then the boosting algorithmiteratively learns the best sub-trees that minimize error on training data. As a base-line, they employ Maximum Entropy classifier, which uses these sub-tree featuresextracted from phone and meeting conversations.Since language used in written conversations differs from the language used inspoken conversations, they employ two semi-supervised techniques (bootstrappingand boosting algorithms) to find new sub-tree features from unlabeled email andforum data. The semi-supervised methods show promising results compared tothe baseline. Particularly, the semi-supervised boosting is competitive with thesupervised method for forum dataset, which employs Maximum Entropy classifierwith sub-tree features extracted from the email and phone conversations.Zhang et al. [49] employ two semi-supervised learning methods of SupportVector Machines and graph-based label propagation for dialogue act recognition intwitter. Their approach tries to address the problem of deficiency of training dataamong tweets. The five defined speech acts in their work are statement, question,suggestion, comment and miscellaneous. The feature set contains lexical featuressuch as speech act-specific cues, special words (abbreviations and acronyms, opin-ion words, vulgar words, and emoticons), and special characters (Twitter specificcharacters and some punctuations). In order to transform the binary classifiers tomulti-class scheme, the label of the corresponding binary classifier with the high-est classification score is chosen as the prediction. Besides, in the graph-basedmethod, the similarity weights in graph are computed using Gaussian function,and the graph is converted to ?NN graph by omitting edges with a weight lowerthan a threshold.They show that SVM performs better than graph-based label propagation onthis problem. By analyzing the results, they conclude that the similarity assumptionin graph-based label propagation method that considers similar tweets to have simi-lar speech acts is not correct. Moreover, inaccurate scores produced by graph-basedlabel propagation method would result incorrect labels in the process of single-class to multi-class conversion. Besides, they prove that increasing the labeled dataand labeled/unlabeled ratio would not improve the performance in semi-supervisedapproaches.182.2. Related WorkSummaryJeong et al. [15]Type Train:Phone-Meeting Test:Email-ForumCorpus SWBD - MRDA - Enron - BC3 - TripAdvisorLevel SentenceTagset 12 tagsFeatures discriminative subtrees of N-grams, depen-dency trees, POS-tags, speaker. For Emailand Forum:message type(initial or reply), au-thorship(2nd or 3rd post by the same au-thor),position of the sentenceLearner Baseline:train only on SWBD and MRDA-Bootstraping:100 near neighbors, tree editdistance - Semi-supervised boosting - Super-vised (Semi-supervised methods beats base-line and they are competitive with supervisedon Forum)Zhang et al.[49]Type TwitterCorpus TwitterLevel PostTagset 5 tagsFeatures speech act specific cues, special words (ab-breviations, acronyms, opinion words, vulgarwords, emoticons), special characters of Twit-ter and some punctuationsLearner SVM and graph-based label propagation(Gaussian function for similarity weight)Table 2.3: Semi-supervised dialogue act recognition related works.Compared to supervised and unsupervised dialogue act modeling, there havebeen a few studies on semi-supervised dialogue act recognition. Moreover, theseprevious works focus only on a specific conversational domain. In this thesis, wepropose a domain-independent semi-supervised dialogue act modeling technique,and analyze its performance on different conversational modalities.19Chapter 3CorporaThis chapter explains the corpora used for dialogue act modeling, and it describesthe conversational structure of spoken and written conversations.3.1 CorporaGathering conversational corpora for dialogue act modeling is an expensive andtime-consuming task. Due to the privacy issues, there are few available conversa-tional datasets.For asynchronous conversations, we use available corpora for email and forumdiscussions. Besides, for synchronous domains we employ available corpora inmulti-party meeting and phone conversations.3.1.1 Email ConversationsBC3 (Email): As the labeled dataset for email conversations, we use BC3, whichcontains 40 threads from W3C corpus [45]. Each thread has been annotated bythree different annotators. The annotation consists of extractive summaries, ab-stractive summaries with linked sentences, speech acts (Propose, Request, Com-mit, Meeting), meta sentences and subjectivity.BC3 has been also annotated with twelve domain-independent dialogue acts,which are mainly adopted from the MRDA tagset. This version of BC3 has beenused in several previous works for unsupervised[18] and semi-supervised dialogueact modeling[15]. In this thesis, we use BC3 for supervised, unsupervised andsemi-supervised dialogue act recognition. This annotation was performed by twohuman annotators, and the inter-annotator agreements was 0.79. In our work, weuse the second version of BC3 with twelve domain-independent dialogue acts tobe able to compare the results through different datasets.Table 3.1 shows the tagset of BC3 with twelve domain-independent dialogueacts.203.1. CorporaTag Dialogue ActsS StatementP Polite mechanismQY Yes-no questionAC Action motivatorQW Wh-questionA Accept responseQO Open-ended questionAA Acknowledge and appreciateQR Or/or-clause questionR Reject responseU Uncertain responseQH Rhetorical questionTable 3.1: BC3 tagset.W3C (Email): We exploit the W3C corpus1 as the unlabeled data for semi-supervised dialogue act modeling on email conversations. Joty et al.[18] usedW3Ccorpus for unsupervised dialogue act modeling. The W3C corpus is a result ofcrawling nearly 50,000 threads from the World Wide Web Consortium?s sites atw3c.org. The data consist mailing lists, public webpages, and text derived from.pdf, .doc and .ppt files. The mailing list part of W3C consists of almost 200,000documents. TREC participants have provided reply-to relations and subject overlapfor this part of the dataset.3.1.2 Forum ConversationsCNET (Forum): As the labeled forum dataset we use the available CNET corpus,which is annotated with 11 domain-independent dialogue acts in a post-level[23].This corpus consists of 320 threads and a total of 1332 posts, which are mostly fromtechnical forums (Operating System, Software, Hardware, and Web Developmentsub-forums of CNET).Two annotators have annotated the corpus, which have also given multiple la-bels and links for each post. Cohen?s Kappa for the post label and link annotationswere 0.59 and 0.78, respectively.The CNET dataset has been used by Paul[31] for unsupervised dialogue actmodeling. We use this dataset for supervised, semi-supervised and unsuperviseddialogue act recognition.1http://research.microsoft.com/enus/um/people/nickcr/w3csummary.html213.1. CorporaFor posts with multiple dialogue acts, we simply consider the first label asthe gold standard. Consequently, one of the tags (ANSWER-CORRECTION) iseliminated in our work. So, our tagset for CNET dataset contains 11 dialogue acts.CNET dialogue acts and their definitions from [23]:? QUESTION-QUESTION (QQ): the post contains a new question, indepen-dent of the thread context that precedes it. In general, QUESTION-QUESTIONis reserved for the first post in a given thread.? QUESTION-ADD (QA): the post supplements a question by providing ad-ditional information, or asking a follow-up question.? QUESTION-CONFIRMATION (QCN): the post points out error(s) in a ques-tion without correcting them, or confirms details of the question.? QUESTION-CORRECTION (QCC): the post corrects error(s) in a question.? ANSWER-ANSWER (AA): the post proposes an answer to a question.? ANSWER-ADD (AD): the post supplements an answer by providing addi-tional information.? ANSWER-CONFIRMATION (AC): the post points out error(s) in an answerwithout correcting them, or confirms details of the answer.? ANSWER-CORRECTION (A-CORR): the post corrects error(s) in an an-swer. This dialogue act is not considered in our work. It is eliminated in theprocess of choosing a single label for sentences with multiple tags.? ANSWER-OBJECTION (AO): the post objects to an answer on experientialor theoretical grounds (e.g. It won?t work.).? RESOLUTION (RS): the post confirms that an answer works, on the basisof implementing it.? REPRODUCTION (RP): the post either: (1) confirms that the same problemis being experienced (by a non-initiator, e.g. I?m seeing the same thing.); or(2) confirms that the answer should work.? OTHER (O): the post does not belong to any of the above classes.BC3 blog (Forum): We exploit BC3 blog corpus2 as unlabeled data for semi-supervised dialogue act recognition of forum discussions. As far as we know, this2http://www.cs.ubc.ca/nest/lci/bc3.html223.1. Corporathesis is the first that uses the BC3 blog corpus for dialogue act modeling. Thiscorpus includes 7000 blog conversations from six popular websites (Slashdot, Dai-lyKos, AndroidCentral, BusinessInsider, Macrumors and TSN). These blog siteshave varied categories of technology, business, politics and sports.TripAdvisor (Forum): We use data crawled from TripAdvisor website forunsupervised dialogue act modeling. TripAdvisor has been used in several previousstudies for unsupervised[18] and semi-supervised[15] dialogue act modeling.3.1.3 Meeting ConversationsICSI-MRDA (Meeting): ICSI-MRDA dataset is used as labeled data for meetingconversation, which contains 75 meetings, each nearly an hour in length. The cor-pus has 53 unique speakers with 6 speakers for each meeting[38]. The 75 meetingsinclude group discussions of the International Computer Science Institute (ICSI)meeting recorder project itself, meetings of a research group focused on robustnessin automatic speech recognition, discussions about natural language processing andneural theories of language, miscellaneous meeting types, and meetings betweenthe corpus transcribers as participants.The corpus is annotated by three labelers with Kappa value of 0.8. The ICSI-MRDA dataset requires one general tag per sentence followed by variable numberof specific tags. There are 11 general tags and 39 specific tags in the annotationscheme. We reduce their tagset to the 11 general tags to be consistent with theother datasets.Table 3.2 shows the tagset of MRDA with 11 general dialogue acts.Tag Dialogue ActsS StatementQY Yes-no questionQO Open-ended questionQW Wh-questionB BackchannelFH Floor holderFG Floor grabberH HoldQRR Or clause after Yes-No questionQR Or questionQH Rhetorical questionTable 3.2: Reduced MRDA tagset.Sun and Morency[39] and Jeong et al.[15] used the MRDA corpus for super-233.1. Corporavised and semi-supervised dialogue act modeling, respectively. In this thesis, weuse this dataset for supervised and semi-supervised dialogue act recognition.AMI (Meeting): As unlabeled multi-party meeting conversations, we use AMIthat includes 100 hours of meeting recordings[8]. The meetings are between mem-bers of the design team. The corpus is annotated with named entities, dialogueacts, summaries, and also gaze and head movement. We use this dataset for semi-supervised dialogue act modeling. As far as we know, this is the first study thatuses AMI corpus for dialogue act recognition.3.1.4 Phone ConversationsSWBD (Phone): In addition to multi-party meeting conversations, we also reportour experimental results on Switchboard-DAMSL (SWBD), which is a large-scalecorpus containing telephone speech [19]. This corpus is annotated with the SWBD-DAMSL tagset, which consists of 220 tags. The Kappa value for eight annotatorsof the corpus is 0.8. The main purpose of labeling Switchboard conversations wasfor learning stochastic discourse grammars to build better language models forautomatic speech recognition.Tag Dialogue ActsS StatementP Polite mechanismQY Yes-no questionAC Action motivatorQW Wh-questionA Accept responseQO Open-ended questionAA Acknowledge and appreciateQR Or/or-clause questionR Reject responseU Uncertain responseQH Rhetorical questionZ HedgeB BackchannelD Self-talkC Signal-non-understandingTable 3.3: Reduced SWBD tagset.We use the mapping table presented by Jeong et al.[15] to reduce the tagsetto 16 domain-independent dialogue acts. Moreover, due to the lack of availabledatasets in this domain, we divide SWBD into labeled and unlabeled data for the243.1. Corporabootstrapping approach. Table 3.3 shows the tagset of SWBD with 16 dialogueacts.SWBD corpus has been used in previous works for supervised[37] and semi-supervised[15] dialogue act modeling. In this thesis, we also use this corpus forsupervised and semi-supervised dialogue act recognition.253.1. CorporaTag Dialogue ActsEmail(BC3)Forum(CNET)Meeting(MRDA)Phone(SWBD)A Accept response 2.07% ? ? 6.96%AA Acknowledge and appreciate 1.24% ? ? 2.12%AC Action motivator 6.09% ? ? 0.38%P Polite mechanism 6.97% ? ? 0.12%QH Rhetorical question 0.75% ? 0.34% 0.25%QO Open?ended question 1.32% ? 0.17% 0.3%QR Or/or?clause question 1.10% ? ? 0.2%QW Wh?question 2.29% ? 1.63% 0.95%QY Yes?no question 6.75% ? 4.75% 2.62%R Reject response 1.06% ? ? 1.03%S Statement 69.56% ? 66.47% 46.44%U Uncertain response 0.79% ? ? 0.15%Z Hedge ? ? ? 11.55%B Backchannel ? ? 14.44% 26.62%D Self?talk ? ? ? 0.1%C Signal?non?understanding ? ? ? 0.14%FH Floor holder ? ? 7.96% ?FG Floor grabber ? ? 2.96% ?H Hold ? ? 0.76% ?QRR Or clause after Yes?No question ? ? 0.38% ?QR Or question ? ? 0.2% ?QQ Question?Question ? 27.92% ? ?QA Question?Add ? 11.67% ? ?QCN Question?confirmation ? 3.89% ? ?QCC Question?Correction ? 0.36% ? ?AA Answer?Answer ? 36.75% ? ?AD Answer?add ? 8.84% ? ?AC Answer?confirmation ? 0.36% ? ?RP Reproduction ? 0.71% ? ?AO Answer?Objection ? 1.07% ? ?RS Resolution ? 7.78% ? ?O Other ? 0.71% ? ?Table 3.4: Dialogue act categories and their relative frequency in all the labeledcorpora.Table 3.4 indicates the dialogue acts of all the corpora and their relative fre-quencies. The table shows that the distribution of dialogue acts in datasets are notbalanced. Most of the utterances are labeled as statements. In addition, all theavailable corpora are annotated with dialogue acts at the sentence-level. The only263.1. Corporaexception is the CNET forum dataset, in which annotation is at the post-level.273.2. Conversational Structure3.2 Conversational StructureAdjacent utterances in a conversation have a strong correlation in terms of theirdialogue acts. As an example, if speaker-1 asks a question to speaker-2, it is a highprobability that the next utterance of the conversation would be an answer fromspeaker 2. Therefore, the conversational structure is a paramount factor that shouldbe taken into account for automatic dialogue act modeling. The conversationalstructure differs in spoken and written discussions. In spoken conversations, thediscussion between the speakers is synchronized. The speakers hear each other?sideas and then state their opinions. So the temporal order of the utterances canbe considered as the conversational structure in these types of conversations. How-ever, in written conversations such as email and forummessages, authors contributeto the discussion in different order, and sometimes they do not pay attention to thecontent of previous posts. Therefore, the temporal order of the conversation cannotbe used as the conversational structure in these domains, and appropriate tech-niques should be used to extract the underlying structure in these conversations.To this aim, when reply links are available in the dataset, we use them to capturethe conversation structure. To obtain a conversational structure that is often evenmore refined than the reply links, we build the Fragment Quotation Graph. Tothis end, we follow the procedure proposed by Joty et al. [18] to extract the graphstructure of a thread.3.2.1 Fragment Quotation GraphThe Fragment Quotation Graph is a graph-based structure that shows the correct se-quence of utterances in asynchronous conversations. We extract the graph structureof email conversations in three steps. In the first step, we find the new and quotedfragments of the conversation. Quoted fragments are identified by the quotationmarks (e.g., ?>?, ?&gt?). Then in the second step, these fragments are comparedto each other and the distinct fragments are identified. In this step, some frag-ments might be split to several distinct fragments. In the last step, we determinethe edges between the fragments, which shows the referential relations betweenthem. In order to identify the edges, the new fragments are supposed to be a re-ply to its neighbor quoted fragments. In addition, we remove the redundant edgeswhen we have transitive relations between several fragments. If an email does notquote any text, we use its reply link for creating edges between its fragments andthe fragments of the email to which it replies.Figure 3.1 shows an example for email conversations and their correspondingFragment Quotation Graph. In the first step of finding new and quoted fragments,in E2, c is a new fragment, and a and b are quoted fragments with depth 1. Simi-283.2. Conversational StructureFigure 3.1: Email conversations and their corresponding Fragment QuotationGraph.larly, in E5, d and e are quoted fragments with depth 2.In the second step of comparing fragments, the fragment of de in E3 is com-pared to d and e in E4 and it is splitted to two fragments of d and e.In the last step of finding the edges, we create edges between new fragmentsand their neighbor quoted fragments. As an example, in E6, the new fragment of jwould have two edges to g and h. At last, we remove the redundant edges. As aninstance, in E5, we have found edges between (h,d), (h,i) and (h,e), but there wasalready an edge between (i,e) and (i,d), so we remove the redundant edges of (h,d)and (h,i).It is assumed that the connected fragments in the fragment quotation graph havea referential relation, and the sentences in these fragments might have a dialogueact dependency between each other. In order to capture this sequential dependencybetween these sentences, the conversational structure is built based on all the pathsin the graph. Consequently, the sentences in the common nodes appear in severalpaths, and their predicted dialogue acts may differ across different sequences. Inorder to identify the dialogue act of these sentences with duplicates, we simplyassign the maximum vote as their dialogue act.29Chapter 4MethodologyIn this section, we describe the domain-independent feature set used for dialogueact modeling, and the classifiers employed for our machine learning experiments.4.1 FeaturesIn defining the feature set, we have two primary criteria, being domain independentand effectiveness in previous works. From previous works, we have gathered thefeatures used and indicated the most effective ones. We have grouped the featuresinto five groups of lexical, temporal features, length features, structural featuresand features related to dialogue acts. The effective features are shown with star.Lexical? unigrams (words)*[10, 11, 13, 18, 22, 31, 33, 34, 39]? bigrams (two adjacent words)*[11, 13, 22, 33, 39]? trigrams (three adjacent words)[13, 22, 33, 39]? n-grams of the previous and the next turn[13]? discriminative subtrees of N-grams*: This feature has been used byJeong et al.[15] for semi-supervised dialogue act modeling. As ex-plained in Chapter 2, Jeong et al. represent each sentence of the corpusas a tree of N-grams. Then using a boosting algorithm, they iterativelylearn the best sub-trees that minimize the error on the training data.? discriminative subtrees of dependency trees*: Jeong et al.[15] also rep-resent each sentence as a dependency tree and extract the most dis-criminative features using their boosting algorithm. A dependency treeshows the dependency relations between the words of the sentence.? part of speech tags[11, 15]? For detecting questions: Part-of-speech tags of the first and the last fiveterms of the utterance[37].304.1. Features? the 100most discriminative POS-bigrams*: Shrestha andMcKeown[37]use POS-bigrams as features for recognizing the dialogue act of ques-tion.To reduce the feature space to 100 features, they use Mutual Infor-mation as the feature selection measure.? the words in the email subject[10]Temporal? temporal distance to the previous turn*: For calculating the temporaldistance, we can use the timestamps provided in some corpora.[13]? temporal distance to the next turn[13]Length? the length of the current sentence: The length of a sentence is definedas the number of words in that sentence.[13, 18, 37]? the length of the previous turn[13]? the length of the next turn[13]Structural? the position of the turn in the discussion*[13, 15, 18, 22]? message type(initial or reply)[15]? authorship(2nd or 3rd post by the same author)[15]? speaker[15, 18, 22]Dialogue Acts? dialogue act of previous utterance*[22]? accumulated dialogue acts of previous utterances[22]? dialogue act of previous utterance by the same author*[22]Structured predictors that we use in our work consider some of the featuresindicated above such as:? n-grams of the previous and the next turn? the length of the previous turn? the length of the next turn314.1. Features? dialogue act of previous utterance? accumulated dialogue acts of previous utterancesLexical features such as unigrams and bigrams have been shown to be usefulfor the task of Dialogue Act (DA) modeling in previous studies [10, 13, 22, 33, 39].In addition, unigrams has shown to be the most effective among the two. So as thelexical feature, we include the frequency of unigrams in our feature set.Besides, Cohen et al.[11] show that using bigrams for dialogue act modelingresults in better performance, in contrast to text classification. In text classification,unigrams are more effective compared to bigrams because in this task, the focus ison the topic words. Whereas, in dialogue act modeling, we want to abstract awaythe topic words to make classifiers not to classify sentences based on their topics.So, in our work we also consider bigrams to analyze their impact for dialogue actmodeling on different conversational modalities.For unigrams, we simply consider each of the words of the corpus as a feature,but for bigrams we choose the most discriminative bigrams as follows:? choose the first five and last five words of the sentence? extract their bigrams? compute Mutual Information (MI) of the selected bigrams by calculating thebelow equation for each selected bigramC(bigram, class)C(bigram, other classes) (4.1)? sort the bigrams of each group based on their score? choose the top 100 of in each groupFor sentences with a length of shorter than ten words, we add dummies ("DUM-MIES") to the sentence. In the last step, we would have the most important 100bigrams from the first five words of the sentence and the most important 100 bi-grams from the last five words.Lexical features helps to predict the class of sentences with some commonwords such as the word of ?Thanks? in a sentence with polite mechanism label.Similarly, bigrams helps to find the class of sentences with some common bigramssuch as ?Do you? in a question.Moreover, length of the utterance is another beneficial feature for dialogueact recognition, which we add to our feature set [13, 18, 37]. The length of a324.2. Unsupervised Dialogue Act Recognitionsentence mostly helps to recognize the label of the shortest and longest utterances,as an example the sentences with polite mechanism tags are mostly short, whereasstatements are the longest utterances.Furthermore, the speaker of an utterance has shown its utility for recognizingspeech acts [15, 18, 22, 39]. The role of the speaker is an important indicator fordetecting the type of their interaction in the group, whether he is a leader who an-swers most of the questions or a usual participant who just asks questions. Sun andMorency [39] specifically employ a speaker-adaptation technique to demonstratethe effectiveness of this feature for dialogue act modeling.We also include the relative position of a sentence in a post for dialogue actmodeling since most of previous studies [13, 15, 18, 22] prove the efficiency ofthis feature. Some tags can be easily recognized by this feature such as politemechanisms or questions, the former one usually occurs at the last part of theconversation, whereas the latter one is mostly the initiator of a conversation.We do not consider temporal features since timestamps are not provided in allthe available corpora and we just choose the domain-independent features to beable to compare the methods in different conversational modalities.4.2 Unsupervised Dialogue Act RecognitionOur motivation for dialogue act recognition on different domains is to analyzeunsupervised, supervised and semi-supervised dialogue act modeling on differentconversations, and possibly find a domain-independent method that performs wellacross different modalities. Among previous works on dialogue act recognition,there were a few studies that investigate dialogue act modeling in an unsupervisedmanner. So we decided to study this problem in the first step of our work.We apply two methods of Hidden Markov Model and Mixed MembershipMarkov Model for unsupervised dialogue act modeling on different datasets. Thefeatures for unsupervised dialogue act modeling in these two techniques are onlythe unigrams, which was widely used as an effective feature in previous unsuper-vised dialogue act modeling studies[18, 31, 34].4.2.1 Hidden Markov ModelHidden Markov Model (HMM) is one of the most important and influential statis-tical models for processing text. HMM is a sequence labeler, which means that itassigns a label for each unit of a hidden sequence. An HMM is described by:? A set of n states S = s1s2...sn334.2. Unsupervised Dialogue Act Recognition? Transition probabilities aij from state i to state j A = a11a12...an1...ann? A sequence of x observations X = x1x2...xn? Emission probabilities (observation likelihoods) which indicates the proba-bility of observation xt being generated from state i B = bi(xt)? Special start state and end state s0, sFFigure 4.1: The graphical model for Hidden Markov Model.Figure 4.1 shows the graphical model of a HMM. In first-order Hidden MarkovModel, we have a Markov assumption which means that the probability of a stateis only dependent on its previous state.P (si|s1...si1) = P (si|si1) (4.2)Learning HMM parameters: Forward-backward algorithmLearning parameters in HMM is to find the transition and emission probabilitiesgiven an observed sequence. This task is usually conducted by computing themaximum likelihood estimate of the parameters given the observed sequence. Theforward-backward algorithm, which is also called Baum-Welch algorithm, is anExpectation Maximization algorithm that is used for this task.The forward-backward algorithm starts with an initial estimate for HMM tran-sition and emission probabilities. It proceeds iteratively with two steps of E-stepand M-step. Expectation-step consists of computing the parameters using previoustransition and emission probabilities. In the maximization-step, we use the com-puted parameters to recompute new estimates for transition and emission probabil-ities.344.2. Unsupervised Dialogue Act RecognitionDecoding HMM: Viterbi AlgorithmViterbi algorithm is a dynamic programming algorithm that finds the most probabletag sequence of hidden states. In a Hidden Markov Model, Viterbi algorithm findsthe most likely sequence of labels by using:? the emission probabilities of bj(xt), which shows the likelihood of observingsymbol xt given the current state of j? the transition probabilities of aij , which indicates the transition probabilityfrom previous state si to the state of sj? Viterbi probability of the previous path in time t 1Viterbi algorithm finds the most probable tag sequence using the following equation:[20]vt(j) = Nmaxi=1 vt1(i)aijbj(xt) (4.3)4.2.2 Mixed Membership Markov ModelWhile in HMM, the sequential structure between states is considered by theMarkovassumption, it has the limitation that each observation can only have one class. Onthe other hand, LDA allows each observation to be generated from a mixture ofclasses. In Mixed Membership Markov Model, proposed by Paul[31] the conver-sational flow is incorporated into the LDA model by defining a message?s classdistribution to depend on the class assignment of the previous message, which isshown in Figure 4.2.If we assume we have K classes, K+2 features are considered, which also in-cludes a feature for indicating whether the block has no parent, and a bias featureto learn a default weight for each class. The transition parameters are called . zindicates a latent class and z is a discrete distribution over word types. ? indicatesthe transition distribution over classes, which is dependent on transition parametersand the feature vector of the parent block.The generative process of M4 is as follows:1. For each (j, k) in the transition matrix ?K?K+2:(a) Draw transition weight jk ? N (0,2). The 0-mean Gaussian prior isfor regularization.2. For each class j:354.2. Unsupervised Dialogue Act RecognitionFigure 4.2: The graphical models for HMM, LDA and M4.[31](a) Draw word distribution j ? Dirichlet(w). Smoothing the word dis-tributions.3. For each block b of each document d:(a) Set class probability ?bj = exp(Tj a)Pj0 exp(0Tj a) for all classes j, where a isthe feature vector for block a, the parent of b.(b) For each token n in block b:i. Sample class z(b,n) ? ?b.ii. Sample word w(b,n) ? z .In Mixed Membership Markov Model, the words are generated by repeatedlysampling classes from the transition distribution ?, then for each sample class z,a word is sampled from z the class-specific distribution over words. Whereas, inHMM, class z is sampled once from the transition distribution ?, and words aregenerated by repeatedly sampling from z .The positive value of the transition probability jk indicates that the occurrenceof class k in the parent block increases the probability of appearing j in the nextblock, and the negative value decreases this probability.Paul adopts Monte Carlo EM for the inference, and alternate between one it-eration of Gibbs sampling and one iteration of gradient ascent for optimizing thetransition parameters.364.3. Supervised Dialogue Act Recognition4.3 Supervised Dialogue Act RecognitionWe also investigate the effectiveness of several supervised methods for this task.In order to capture the sequential dependency between the utterances, we employthree sequence labeling algorithms:? SVM-hmm? CRF? SVM-multiclassThe feature set for this task consists of:? Unigrams? Speaker/author? Length of the sentence? Relative position of the sentence in the post/threadIn supervised dialogue act modeling with SVM-hmm, we also add the bigram fea-tures to evaluate their effectiveness.4.3.1 SVM-hmmSVM Hidden Markov Model predicts labels for the examples in a sequence [42].This approach uses the Viterbi algorithm to find the highest scoring tag sequencefor a given observation sequence. Being an HMM, the model makes the Markovassumption, which means that the label of a particular example is assigned onlyby considering the label of the previous example. This approach is considered anSupport Vector Machines (SVM) because the parameters of the model are traineddiscriminatively to separate the label of sequences by a large margin.SVM allows us to use an arbitrary feature representation instead of just thetokens in HMM, and also we can achieve the benefit of discriminative training inSVM-hmm, which is not available in HMM.In SVM each training example xi is labeled with a correct tag sequence yiextracted from ambiguity class of Yi. We create the feature vector Q(xi, yi) ina way that it represents each possible tag for the input x and its relationship toit. SVM-hmm trains emission weights for each different tag sequence and also thetransition weights for adjacent tags. The learner weights the features using a vectorw in a way that the correct tag sequence receives more weight than the incorrect374.3. Supervised Dialogue Act Recognitiontag sequences. SVM-hmm maximizes the score of the correct tag sequence withthe following equation:8i8y2Yi,y 6=yi [Q(xi, yi).w > Q(xi, y).w] (4.4)Then using the learned weight vector ofw, SVM classifies test examples in thisway:argmaxy2Yi [Q(xi, y).w] (4.5)In this equation the argmax is conducted using the Viterbi algorithm[4].4.3.2 SVM-multiclassSVM-multiclass is a generalization of binary SVM to a multiclass predictor [12].If we have a training set of {x1, x2, ..., xn} with labels of {y1, y2, ..., yn}, and anambiguity class of Y which contains {y1, y2, ..., yk} classes, SVM-multiclass triesto find a solution for a optimization problem:min12kXi=1 |wi|2 + Cn nXi=1 ?iS.t. 8yi2Y{1..k}8j2{1..n}(xj ? wyi)  (xj ? wy) + 100 ?(yj , y) ?j (4.6)(yj , y) = ? 0 if yj = y1 else (4.7)C is the regularization parameter that trades off between margin size and train-ing error.[17]We also compare the results of sequence labelers to SVM-multiclass to showthe importance of dialogue acts dependencies. The SVM-multiclass does not con-sider the sequential dependency between the examples.4.3.3 Conditional Random FieldsConditional Random Fields (CRF) is a probabilistic framework to label and seg-ment sequence data [24]. The main advantage of CRF over HMM is that it relaxesthe assumption of conditional independence of observed data in HMM. HMM isa generative model that assigns a joint distribution over label and observation se-quences. Whereas, as shown in Figure 4.3 CRF is a discriminative model, which384.4. Semi-supervised Dialogue Act RecognitionFigure 4.3: The graphical model for Conditional Random Fields.defines the conditional probability distribution over label sequences given a partic-ular observation sequence.[48]P (y1:n|x1:n) = 1Z(x1:n) nYi=1(yi, yi1, x1:n)=1Z(x1:n, w) nYi=1 exp(wT f(yi, yi1, x1:n)) (4.8)For inference in CRF, Viterbi decoding is used as in SVM-hmm. Given pa-rameters of CRF (w), Viterbi finds the most probable tag sequence of y? for theobservations by maximizing P (y|x):y? = argmaxy exp( nXi=1 wT f(yi, yi1, x)) (4.9)4.4 Semi-supervised Dialogue Act RecognitionDue to the better performance of SVM-hmm for supervised dialogue act classifi-cation, we employ SVM-hmm for the semi-supervised dialogue act modeling. Forstudying the effect of adding unlabeled data to each conversational domain, we usethe bootstrapping technique [1]. The feature set for this task is similar to the featureset for supervised dialogue act modeling:? Unigrams? Speaker/author394.4. Semi-supervised Dialogue Act Recognition? Length of the sentence? Relative position of the sentence in the post/thread4.4.1 BootstrappingBootstrapping is a technique for classification with a small set of labeled data anda large available unlabeled data. In order to exploit the large available unlabeleddata, we can label them using a model trained on the small labeled dataset. Thisprocess can be done iteratively to improve the performance of the trained modelin each iteration. Figure 4.4 shows the process of adding unlabeled data to thedialogue act modeling system, which uses SVM-hmm as the classifier.Figure 4.4: Bootstrapping.Algorithm 1 explains our semi-supervised dialogue act modeling approach,which proceeds iteratively. In each iteration, it uses the trained model to predictlabels for the unlabeled data. The examples in the unlabeled data for which themodel is confident about their labels will be added to the training data, and thiswould expand the size of labeled data in the next iteration. This process terminateswhen the accuracy of the model on the test set decreases.33A held-out test set from the training data, not the final test set.404.4. Semi-supervised Dialogue Act RecognitionAlgorithm 1 Semi-supervised dialogue act modelingfor all 5-fold cross-validation do{Divide the labeled data into final test data and bootstrapping data.}for all 5-fold cross-validation do{Divide the bootstrapping data into test and training data.}while true do1. Train SVM-hmm on the training data2. Classify the test data3. Compute accuracy on the test dataif accuracy has decreased thenbreakelseUse Viterbi scores to add the most confident sequences to the trainingdataend ifend whileCompute accuracy of the previous (best) model on the final test dataend forend forreturn average of the 25 accuraciesOne of the challenging problems in our semi-supervised technique is to definethe confidence score in SVM-hmm. We investigate the utility of the Viterbi scoreas the confidence score in SVM-hmm. To the best of our knowledge, this has notbeen studied in the machine learning literature, and the challenge in this case is thelength difference between sequences. Therefore, in order to choose the most con-fident examples, we rank all the sequences based on their Viterbi score normalizedby the length of the sequence. Then we select the top n sequences that would beadded to the labeled data.41Chapter 5Experiments and ResultsThe first part of this chapter explains the evaluation metrics used in evaluating dif-ferent dialogue act modeling methods. The second part illustrates the experimentalsettings, and the last section demonstrates the results of dialogue act recognitiontechniques on different corpora.5.1 Evaluation MetricsFor evaluating the methods, we use different evaluation metrics such as macro-average, micro-average, perplexity and variation of information. In this section weexplain each metric.By comparing the predicted label of an instance to its ground truth, we canhave four different performances such as true positive, false positive, true negativeand false negative, as it is shown in table 5.1.[47]Rijsbergen[46] introduced precision for information retrieval purposes. Giventhe table 5.1, precision can be computed in this way:Precision = true positivestrue positives + false positives (5.1)In multi-class classification, average of evaluation measures can give us a goodview of the results. These metrics are known as micro-average and macro-average.5.1.1 Micro-averageIf we have a set of labels {L1, L2, ..., Lm}, each of them would have their corre-sponding true positive {tp1, tp2, ..., tpm}, false positive {fp1, fp2, ..., fpm}, andTrue L True not LPredicted L true positive (tp) false positive (fp)Predicted not L false negative (fn) true negative (tn)Table 5.1: Instance labeled with L.425.1. Evaluation Metricsfalse negative {fn1, fn2, ..., fnm}. As defined in [26, 36, 44], Micro-average canbe computed as follows:Micro average = Pmi=1 tpiPmi=1 tpi +Pmi=1 fpi (5.2)5.1.2 Macro-averageMacro-average is the average over the precision of each individual label[44].Macro average = 1m mXi=1 tpitpi + fpi (5.3)In contrast to micro-average, macro-average gives equal weight to each class.Consequently, the metric of micro-average shows the performance on large classesin the test data. To get a good view for performance on small classes, we needto also compute the macro-average[26]. As it is shown in table 3.4, in all thedatasets, the distribution of classes are not balanced and we have many dialogueacts with a small frequencies in the dataset. In order to get the best intuition aboutthe classification task, we compute both of the metrics of macro-average and micro-average.5.1.3 PerplexityPerplexity is a good measure of uncertainty of a model. It can measure how wellthe model fits the test data. As specified in [5], the perplexity of a probabilitydistribution with H(p) as its entropy is computed as follows:2H(p)= 2Px p(x) log2 p(x) (5.4)In natural language processing, the perplexity per Viterbi-tagged sentence isdefined as: exp( log p(s1, d1, s2, d2, ..., sn, dn|s0, d0)n ) (5.5){s0, s1, s2, ..., sn} are the sentences in the test data, and the {d0, d1, d2, ..., dn}are the winning tag sequence that the classifier assigns to the test sequence.From the equations, we can conclude that the best model is a model, whichassigns tags to a sequence with higher probability, and lower perplexity, whichcorresponds to lower uncertainty.435.2. Experimental Settings5.1.4 Variation of InformationMeila[28] proposed an information theoretic metric for comparing two clusteringsof the same dataset. Previous criteria try to find a best match for each cluster,and sum over the matches between the clusterings. In this way, they completelyoverlook the unmatched part of each cluster. The Variation of information metrictries to indicate how much information is in each of the clusterings, and how muchinformation one clustering gives about the other.In this approach, the uncertainty about the cluster of a data point in clusteringC is defined as the entropy (H(C)). Moreover, the mutual information between twoclusterings is defined as the information one clustering contains about the other.Suppose we have a random data point in our dataset. The uncertainty about itscluster in clustering C? can be computed by H(C?). Now, we are told the cluster ofthis data point in clustering C. So we will have a reduction in uncertainty about C?by knowing this information. This reduction is equal to I(C,C?) which is averagedover all the data points. Variation of information in two clusterings of C and C?, isdefined as follows: V I(C,C 0) = H(C) + H(C 0) 2I(C,C 0) (5.6)Variation of information is a measure of uncertainty about a clustering givenanother clustering. So, the lower variation of information corresponds to highersimilarity between two clusterings.5.2 Experimental SettingsFor unsupervised dialogue act modeling, we use the package available from [31],which includes Hidden Markov Model and Mixed Membership Markov Model.For SVM Hidden Markov Model, we use the package of SVM-hmm, which pre-dicts tags for the instances in a sequence by using the Viterbi algorithm [16]. ForConditional Random Field, we employ the Mallet CRF, which allows continuousand discrete features [27]. We also use the SVM-multiclass to compare its re-sult to the results of sequence labelers, which consider the sequential dependenciesbetween the dialogue acts [17].In unsupervised method of Mixed Membership Markov Model ? is set to 0.1,which specifies the gradient step size. We set the 2 to 10, which shows the vari-ance of transition parameters .The results of supervised classifications are compared to the baseline, which isthe majority class of each dataset. We apply 5-fold cross-validation to each dataset445.3. Resultsfor the supervised learning methods, and compare the results of different methodsusing macro-average and micro-average accuracies. In SVM-hmm, we set the Cparameter to 5, which specifies the trade-off between slack and magnitude of theweight vector. The e parameter (epsilon) is set to 0.1, which indicates the precisionto which the constraints should be satisfied.In bootstrapping, we apply two 5-fold cross-validation, one for test data andone for final test data, which would result in 25 accuracies. Based on our pre-liminary experiments, we decide to set the n parameter in our semi-supervisedframework to 1%. The parameter n defines the percentage of unlabeled data thatis automatically added to the labeled data in each iteration of the semi-supervisedframework.5.3 ResultsIn this section, we report the results of different unsupervised, supervised and semi-supervised dialogue act modeling methods on different corpora.5.3.1 Unsupervised Dialogue Act RecognitionIn this section, we report the results of unsupervised dialogue act recognition ofdifferent datasets of forum and email. We analyze the performance of HiddenMarkov Model on CNET, TripAdvisor and BC3 corpora.Corpus CNET TripAdvisor BC3Macro-average 12.24% 24.49% 15.64%Micro-average 15.45% 24.36% 27.03%Variation of information 5.497 8.493 7.782Perplexity 1009.44 1392.05 1621.458Thread reconstruction accuracy 17.92%(baseline:17.77%)4.216%(baseline:3.229%)1.35% (base-line: 1.24%)Table 5.2: Results of unsupervised dialogue act modeling using Hidden MarkovModel on three datasets of BC3, CNET and TripAdvisor.Table 5.2 demonstrates the results of unsupervised dialogue act modeling us-ing Hidden Markov Model on these corpora. Macro-average and micro-averageaccuracies are low on all the datasets. Based on other metrics of variation of in-formation and perplexity, unsupervised dialogue act recognition performs betteron CNET dataset compared to TripAdvisor and BC3. One possible reason might455.3. Resultsbe similar conversation subject within different CNET threads. The CNET corpuscontains conversations about technical subjects, whereas the subjects of conversa-tions are different in the other datasets.We also investigate the performance of unsupervised dialogue act recognitionon these corpora by computing their accuracy for thread reconstruction. As theTable 5.2 shows, there are no significant improvement over the random baseline.We analyze the performance of Mixed Membership Markov Model on CNETand TripAdvisor corpora. Table 5.3 shows the results of unsupervised dialogue actmodeling using Mixed Membership Markov Model on these corpora. Variation ofinformation and perplexity of TripAdvisor are improved using Mixed MembershipMarkov Model. We can argue that importing the LDA features into HMM improvethe performance. Macro-average and variation of information on CNET are alsoimproved compared to using Hidden Markov Model.Corpus CNET TripAdvisorMacro-average 15.66% 12.25%Micro-average 14% 21.12%Variation of information 4.153 3.369Perplexity 1026.017 1291.65Thread reconstruction accuracy 18.21% 3.613%Table 5.3: Results of unsupervised dialogue act modeling using Mixed Member-ship Markov Model on two datasets of CNET and TripAdvisor.Since the results of unsupervised dialogue act modeling are not promising, wedecided to narrow down our analysis on supervised and semi-supervised dialogueact modeling. Due to the unbalanced frequencies of dialogue acts in conversations,we can argue that a little supervision is needed for dialogue act modeling on thesecorpora.5.3.2 Supervised Dialogue Act RecognitionTable 5.4 and table 5.5 show the results of supervised classification on differentconversation modalities. We observe that SVM-hmm and CRF classifiers outper-form SVM-multiclass classifier in all conversational domains. Both SVM-hmmand CRF classifiers consider the sequential structure of conversations, while this isignored in the SVM-multiclass classifier. This shows that the sequential structureof the conversation is beneficial independently of the conversational modality. Wecan also observe that the SVM-hmm algorithm results in the highest performancein all datasets.465.3. ResultsCorpusBaseline SVM-multiclass SVM-hmm CRFMicro Macro Micro Macro Micro Macro Micro MacroBC3 69.56 8.34 73.57 8.34 77.75 18.20 72.18 14.9CNET 36.75 9.09 34.8 9.3 58.7 17.1 40.3 11.5MRDA 66.47 9.09 66.47 9.09 80.5 32.4 77.8 22.9SWBD 46.44 6.25 46.5 6.25 74.32 30.13 73.04 24.05Table 5.4: Results of supervised dialogue act modeling; columns are macro-average and micro-average accuracy.CorpusBaseline SVM-multiclass SVM-hmm CRFMicro Macro Micro Macro Micro Macro Micro MacroBC3 69.56 8.34 +4.01 0 +8.19 +9.86 +2.62 +6.56CNET 36.75 9.09 -1.95 +0.21 +21.95 +8.01 +3.55 +2.41MRDA 66.47 9.09 0 0 +14.03 +23.31 +11.33 +13.81SWBD 46.44 6.25 +0.06 0 +27.88 +23.88 +26.6 +17.8Table 5.5: Results of supervised dialogue act modeling; columns are the improve-ment of macro-average and micro-average compared to the baseline.Comparing the results across different datasets, we can also note that the largestimprovement of SVM-hmm and CRF is on the SWBD, the phone conversationdataset. Moreover, supervised dialogue act recognition on synchronous conversa-tions achieves a better performance than on asynchronous conversations. We canargue that this is due to the less complex sequential structure of synchronous con-versations. A lower macro-average accuracy in asynchronous conversations (i.e.,forums and emails) can be justified in the same way.By looking at the results in asynchronous conversations, we observe a largerimprovement of micro-average accuracy over the CNET corpus. This might bedue to two reasons: i) the dialogue act tagsets in both corpora are different (i.e.,no overlap in tagsets); and ii) the conversational structure in forums and emails isdifferent.475.3. ResultsComparing SVM-hmm and CRFMost of the sentences in the four corpora are classified as the majority class sincethe datasets are not balanced and the majority class includes a great part of thedatasets. But different classifiers may have different performance for distinct dia-logue acts. Figure 5.1 shows the result of classifying BC3 corpus using SVM-hmmand CRF. As we can see from the figures, CRF performs better than SVM-hmm onthe following classes:? Action motivator (AC)? Polite mechanism (P)? Open-ended questions (QO)In order to get a better classifier, we can combine the results of these two classifiersfor predicting these dialogue acts.Figure 5.2 demonstrates the confusion matrices of classifying BC3 using SVM-hmm and CRF. As the figures show most of the misclassified sentences are pre-dicted as statements. Moreover, SVM-hmm performs better than CRF on pre-dicting questions such as WH questions (QW), Or/or-clause questions (QR) andYes-no questions (QY).485.3. Results(a) SVM-hmm accuracy on BC3.(b) CRF accuracy on BC3.Figure 5.1: Comparing SVM-hmm and CRF on accuracy of each of the 12 classesof BC3.495.3. Results(a) Confusion matrix of supervised dialogue act modeling on BC3 usingSVM-hmm.(b) Confusion matrix of supervised dialogue act modeling on BC3 usingCRF.Figure 5.2: Comparing SVM-hmm and CRF with confusion matrices.505.3. ResultsCoarse-grained dialogue actsWe try to analyze classifying dialogue acts with few numbers of classes. Our mainreasons for this task are:? Most of the sentences are misclassified as statements, which is the majorityclass.? We have an unbalanced distribution of dialogue acts in the corpora. Fre-quency of some of the tags is less than 1% of the whole dataset. We want togroup these tags to a single class.Table 5.6 shows the results of supervised dialogue act modeling on BC3 usingSVM-hmm.BC3 corpus 12 dialogue acts 5 dialogue actsMicro-average 77.75% 81.18%Macro-average 18.2% 40.55%Table 5.6: Results of supervised dialogue act modeling on BC3 using SVM-hmm.The tagset was reduced to 5 dialogue tags. The baseline accuracy is 69.56%, whichshows the majority class.The tagset is reduced to 5 dialogue acts as follows:Statement (S)? Statement (S)Reply (R)? Accept response (A)? Reject response (R)? Uncertain response (U)? Acknowledge and appreciate (AA)Question (Q)? Yes-no question (QY)? Or/or-clause question (QR)? Wh-question (QW)515.3. Results? Open-ended question (QO)Suggestion (SU)? Action motivator (AC)Miscellaneous (M)? Rhetorical question (QH)? Polite mechanism (P)As expected, the micro-average of classifying BC3 to 5 dialogue acts does notperform significantly better than classifying it to 12 dialogue acts. Since the datasetis not balanced, and most of the sentences are misclassified to statements in bothexperiments. On the other hand, macro-average of classifying BC3 to 5 dialogueacts is higher than the macro-average of classifying it to 12 dialogue acts. This wasexpected since the number of classes has been decreased and some of the dialogueacts with small frequencies were clustered into a single group.Figure 5.3: SVM-hmm accuracy on BC3 with 5 dialogue acts.Figure 5.3 demonstrates the accuracy of classifying BC3 to 5 dialogue actsusing SVM-hmm. SVM-hmm performs well for classifying the sentences to twogroups of questions and statements. In order to improve the results, we may want525.3. ResultsFigure 5.4: Confusion matrix of supervised dialogue act modeling on BC3 usingSVM-hmm. The tagset was reduced to 5 tags.to classify the sentences in two levels. The first level could classify sentences intoquestions and non-questions, and then the second level could predict labels for thesentences in each group.Figure 5.4 shows the confusion matrix of classifying BC3 to 5 dialogue actsusing SVM-hmm. Nearly all the statements are correctly classified as statements.Similarly, most of the questions are classified correctly and the misclassified sen-tences are predicted as statements. The other classes are mostly misclassified (R toS, M to M and S, SU to S and Q).Additional FeaturesAfter analyzing the performance of dialogue act modeling on different datasets, weadd bigrams to our feature set and analyze the effect of this feature. We extract themost discriminative bigrams from the first five and last five words of the sentencesand add them to the feature set.Table 5.7 shows the results of supervised dialogue act modeling using SVM-hmm and an additional feature of bigrams. Adding bigrams does not show a sig-nificant improvement over previous experiments. The highest improvement is forCNET corpus in which the micro-average increased from 58.7% to 59.29%.535.3. ResultsCorpus Without bigrams With bigramsBC3 77.75% 78.04%CNET 58.7% 59.29%MRDA 80.5% 80.5%SWBD 74.32% 74.4%Table 5.7: Micro-average of supervised dialogue act modeling using SVM-hmm.The most discriminative bigrams are added as the features.Table 5.8 and table 5.9 show the selected bigrams for each dataset. As we cansee most of the bigrams can be useful for recognizing one of the dialogue acts. Asan example, the bigram of (Thanks DUMMIES) in BC3 can be advantageous forrecognizing the dialogue act of polite mechanism if "Thanks" is occured at the endof a sentence.Corpus Selected bigramsBC3 (if you) , (be able) , (Cheers DUMMIES) , (not yet) , (Also, why),(I?ve got) , (is there) , (I have) , (opportunity to) , (that case,) , (movingforward) , (you feel) , (I?ve heard) , (helpful, if) , (sure who) , (I am), (Hi, I) , (Thanks DUMMIES) , (has been) , (Do you) , (who from) ,(be helpful,) , (that if) , (been proposed) , (How do) , (forward with) ,(sure. DUMMIES) , (confirm or) , (might also) , (We are) , (feel we) ,(Regards, DUMMIES) , (do people) , (This is) , (hope I) , (feel about) ,(I missed) , (would it) , (I might) , (have not) , (proposed that) , (I do) ,(Thanks, DUMMIES) , (would like) , (If the) , (example: DUMMIES), (I?m not) , (Thanks for), (Why did)CNET (i posted) , (How loud) , (Hey,I am) , (I?ll be) , (just got) , (go to) , (havefilled) , (Hey I) , (the problem) , (be getting) , (am wanting) , (how to) ,(I see) , (Hello there,You), (am already) , (willing to) , (Wondering on), (Since I?ll) , (I?d be) , (wish to) , (previous problem) , (have tried) ,(I looked) , (anyone know) , (is better) , (wanting to) , (got a) , (knowthat) , (just bought) , (guys might) , (I think,) , (tell me) , (Bought an), (providers have) , (be willing) , (me how) , (Ya sorry) , (bought a) ,(assume the) , (If so.) , (as i) , (While we)545.3. ResultsCorpus Selected bigramsMRDA (that um), (they don?t), (whatever DUMMIES) , (maybe it?s), (abso-lutely DUMMIES), (that?s true), (i can?t), (i remember) , (that?s inter-esting) , (may not) , (it will) , (you can?t) , (thought that), (that?s great) ,(wouldn?t be) , (it wouldn?t) , (it?s really) , (agree DUMMIES) , (thingslike) , (oh no) , (because um) , (that?s ok) , (got it) , (don?t wanna) , (it?skind) , (there were) , (could have) , (because DUMMIES) , (as long) ,(oh that?s) , (that it?s) , (that?s fine) , (because i) , (that?s good) , (thankyou) , (because we) , (oops DUMMIES) , (you might) , (because uh) ,(because then) , (and maybe) , (can just) , (as i) , (that?s it) , (cuz it?s) ,(thanks DUMMIES) , (i really) , (that?d be) , (it?s so)SWBD (Have you) , (How long), (like we) , (# Uhhuh.) , (they?re very) , (bye-bye. DUMMIES) , (Oh, really?) , (Oh, uhhuh.) , (Oh, really.) , (Andthey?re) , (# Uhhuh) , (it?s funny) , (# Oh.) , (nice talking) , (enjoyedtalking) , (Byebye. #) , (one time) , (I remember) , (Oh. #) , (goodtalking), (I?m pretty) , (now I?m) , (Huh. #), (Oh, great.) , (Huh. DUM-MIES) , (uhhuh. #) , (Oh really.) , (Um. DUMMIES) , (Oh, okay.), (Oh, okay,) , (# Byebye.) , (Hi. DUMMIES) , (Oh. DUMMIES) ,(Wow. DUMMIES), (about you?) , (Nice talking) , (Oh, wow.) , (Uh,my) , (Bye. DUMMIES), (was always) , (think maybe) , (think I?ve), (That?s great.) , (old are) , (talking with) , (Right now), (So. DUM-MIES) , (my gosh.), (bye. DUMMIES) , (and I?d) , (We?ve got) , (aboutyourself?) , (Uhhuh. #) , (# Huh.) , (yourself? DUMMIES) , (faint?Uhhuh.) , (there?s been), (And I?ve)Table 5.8: Sample bigrams extracted from the first five words of the sentence bythe mutual information method.555.3. ResultsCorpus Selected bigramsBC3 (with alt) , (so I) , (this meeting,) , (with a) , (fun family) , (aside, does), (a page) , (present ?) , (gone overboard) , (more convenient) , (the full), (to demonstrate) , (to do) , (the use) , (although the) , (suppose it?ll), (visit to) , (actually doing) , (less than) , (I was) , (opportunity to) ,(comments in) , (of this) , (create a) , (regrets. DUMMIES) , (response.DUMMIES) , (problems regarding) , (such a) , (discuss the) , (I am) ,(need the) , (but would) , (ever, but) , (meeting in) , (mind, together), (does anyone), (not gone) , (know what?s) , (this ?) , (Accessibility?) , (to get), (the details) , (this as) , (we?re less) , (some fun) , (thisopportunity) , (would you) , (who will) , (is actually) , (details of) ,(attending ?) , (be attending) ? (positive response.) , (you all) , (besomewhat) , (it?ll be)CNET (could not) , (machine and) , (my current) , (to give) , (I?m planning) ,(your working) , (the CD) , (help from) , (about this) , (a while) , (andalmost) , (soon, I?m) , (available for) , (speakers? If) , (responses to), (help. DUMMIES) , (having great) , (not load) , (how to) , (I was), (hand you) , (the ready) , (with this) , (use this) , (ready answers) ,(download it) , (not showing), (planning to) , (right hand) , (are going) ,(my guess) , (you will) , (from responses) , (me from) , (then possibly), (trouble with) , (Direct, on) , (the old) , (no way) , (back up) , (goesfor) , (having trouble) , (top right) , (while back,) , (they are) , (answersabout) , (There were) , (doing work) , (onto my) , (from a) , (and having), (you back) , (PC to) , (with help) , (you put) , (I?ve been) , (possiblythe) , (response. I?ve) , (buy one.) , (been doing) , (comes from)MRDA (the second), (we?re doing) , (is gonna) , (whatever DUMMIES) , (figureout) , (later DUMMIES) , (could do) , (listen to) , (time DUMMIES) ,(was DUMMIES) , (make it) , (it could) , (here is) , (much more) , (lotDUMMIES) , (so forth) , (too much) , (you can?t) , (lots of) , (but i), (you should) , (it?s just) , (doing this) , (i thought) , (at that) , (knowyou) , (and that) , (guess DUMMIES) , (good idea) , (or less) , (usea), (and we) , (will have) , (know that) , (a big) , (good DUMMIES), (and ), (different DUMMIES) , (before DUMMIES) , (some kind), (that there), (it?s DUMMIES) , (fine DUMMIES) , (is like) , (rightDUMMIES) , (yeah DUMMIES) , (you don?t) , (problem DUMMIES), (actually DUMMIES) , (of um) , (different ) , (sure that) , (amount of), (has a) , (bad DUMMIES)565.3. ResultsCorpus Selected bigramsSWBD (who was) , (well, you) , (we went) , (it had), (I thought) , (than I) , (wasabout) , (old and) , (why I), (uh, my) , (year old), (it?s been) , (it?s kind), (here is) , (very well) , (that uh,) , (that, I) , (same thing), (I?ve ever) ,(thing I) , (times a) , (it?s really) , (we didn?t) , (I?ve done) , (more, uh,), (probably the) , (doing that) , (when they?re) , (hundred and) , (nowis) , (um, a) , (these things) , (where I) , (bit more) , (to listen) , (uh,and) , (to play) , (was, uh,) , (starting to) , (good DUMMIES) , (aroundhere) , (a week,) , (about two) , (other day) , (point where) , (from, uh,) ,(seems like), (I saw) , (I read), (too many), (know, they) , (same thing.) ,(doesn?t have) , (for us) , (really don?t) , (I?ve had) , (I?ve been) , (everytime) , (them are) , (the day)Table 5.9: Sample bigrams extracted from the last five words of the sentence bythe mutual information method.575.4. Discussion5.3.3 Semi-supervised Dialogue Act RecognitionTable 5.10 demonstrates the results of our bootstrapping method on all the domains.Corpus Supervised BootstrappingBC3+W3C 77.15% 77.21%CNET+BC3 blog 58.93% 58.93%MRDA+AMI 80.65% 80.65%SWBD 74.87% 74.87%Table 5.10: Accuracy of semi-supervised dialogue act modeling.We observe that the bootstrapping method in our experiments does not improvethe performance of supervised dialogue act modeling. We suspect this to be due tothe following reasons: i) using the normalized Viterbi score as the confidence scorefor pruning the erroneous unlabeled data; ii) setting a common parameter n acrossdifferent datasets that vary in size; and iii) our semi-supervised algorithm (i.e., thebootstrapping approach). In light of these results, we can conclude that a morecomprehensive semi-supervised method is needed to better reveal the potential ofusing unlabeled data along with the labeled data to improve the performance ofdialogue act recognition systems. Further investigation of these possible reasons isleft as future work.5.4 DiscussionIn this section, we investigate the performance of SVM-hmm, the best method fordialogue act modeling compared to other approaches. We study the effectivenessof SVM-hmm for classifying each of the dialogue acts in each dataset.As shown in [2], generalization performance of SVM-hmm is superior to CRF.This superiority also applies to the dialogue act modeling task across all the conver-sational modalities. However, as it was investigated by Keerthi and Sundararajan[21], the discrepancy in the performance of these methods may arise from differ-ent feature functions that these two methods use, and they might perform similarlywhen they use the same feature functions.To further analyze the performance, Table 5.11 illustrates the accuracy of SVM-hmm for each dialogue act. The SVM-hmm succeeds in predicting the statementspeech acts across different datasets of email, meeting, and phone conversations.Similarly, it performs well for predicting the answer-answer speech acts in the fo-rum dataset. Moreover, the backchannel speech act, a common dialogue act inMRDA and SWBD datasets, reports a high accuracy. However, there are some585.4. DiscussionDialogue Acts BC3 CNET MRDA SWBDAccept response 0% ? ? 23%Acknowledge and appreciate 0% ? ? 41%Action motivator 0% ? ? 7%Polite mechanism 18% ? ? 0%Rhetorical question 0% ? 0% 0%Open-ended question 0% ? 1% 49%Or/or-clause question 13% ? ? 22%Wh-question 15% ? 55% 59%Yes-no question 74% ? 1% 32%Reject response 0% ? ? 61%Statement 99% ? 91% 96%Uncertain response 0% ? ? 0%Hedge ? ? ? 16%Backchannel ? ? 87% 89%Self-talk ? ? ? 0%Signal-non-understanding ? ? ? 7%Floor holder ? ? 67% ?Floor grabber ? ? 24% ?Hold ? ? 21% ?Or clause after Yes-No question ? ? 5% ?Or question ? ? 3% ?Question-Question ? 53% ? ?Question-Add ? 4% ? ?Question-confirmation ? 0% ? ?Question-Correction ? 0% ? ?Answer-Answer ? 50% ? ?Answer-add ? 1% ? ?Answer-confirmation ? 0% ? ?Reproduction ? 0% ? ?Answer-Objection ? 0% ? ?Resolution ? 0% ? ?Other ? 0% ? ?Table 5.11: SVM-hmm accuracy for different dialogue acts.common dialogue acts across different datasets that the SVM-hmm algorithm failsin recognizing them well (e.g., rhetorical questions and uncertain response). Thisis mainly due to the small frequency of such dialogue acts in our corpora.We can also observe that the supervised dialogue act modeling for the sameset of dialogue acts performs better on the spoken conversations compared to thewritten discussions. For example, accept response, acknowledge and appreciate,595.4. Discussionaction motivator, open-ended questions, or-clause question, wh-question, and re-ject response achieve higher results in SWBD and MRDA conversations comparedto the same dialogue acts in BC3. However, there are a few dialogue acts that theSVM-hmm model performs better over the email dataset in comparison with themeeting and phone corpora, e.g., statement, polite mechanism, and yes-no ques-tion. A possible reason is that the frequencies of these dialogue acts vary amongdifferent datasets.In general, as the Table 3.4 illustrates the relative frequency of dialogue acts indifferent datasets, the distribution of classes is unbalanced through all the datasets.Consequently, during the classification step, most of the utterances are labeled asthe statement dialogue act. This often affects the performance of a classifier indealing with low frequency classes. A possible approach to tackle this problem is tocluster the correlative dialogue acts into the same group and apply a DA modelingapproach in a hierarchical manner.We analyze the strengths and weakness of supervised dialogue act modelingwith SVM-hmm in different conversations individually.BC3: SVM-hmm succeeds in classifying most of the statement and yes-noquestion speech acts in the BC3 corpus (see Table 5.11). However, it does not showa high accuracy for classifying polite mechanisms such as ?thanks? and ?regards?.Through the error analysis, we observed that in most of these cases the error arosefrom the voting algorithm. Moreover, the improvement of supervised dialogue actmodeling on the BC3 corpus is smaller than the other datasets. This suggests thatemail conversation is a challenging domain for dialogue act recognition.CNET: The inventory of dialogue acts in the CNET dataset can be consideredas two groups of question and answer dialogue acts, and we would need more so-phisticated features in order to classify the posts into the fine-grained dialogue acts.The SVM-hmm succeeds in predicting the labels of question-question and answer-answer dialogue acts, but it performs poorly for the other labels. The improvementof dialogue act modeling over the baseline is significant for this dataset. To furtherimprove the performance, a hierarchical dialogue act classification can be applied.In this way, the posts would be classified into question and non-question dialogueacts in the first level.MRDA: SVM-hmm performs well for predicting the classes of statement, floorholder, backchannel, and wh-question. Floor holders and backchannels are mostlythe short utterances such as ?ok?, ?um?, and ?so?, and we believe the length andunigrams features are very effective for predicting these dialogue acts. On the otherhand, SVM-hmm fails in predicting the other types of questions such as rhetoricalquestions and open-ended questions by classifying them as statements. Arguablythat by adding more sophisticated features such as POS tags, SVM-hmm wouldperform better for classifying these speech acts.605.4. DiscussionSWBD: The improvement of supervised dialogue act recognition on the SWBDis higher than the other domains. Supervised dialogue act classification correctlypredicts most of the classes of statement, reject response, wh-question, and backchan-nel. However, SVM-hmm cannot predict some specific dialogue acts of phoneconversations such as self-talk and signal-non-understanding. There are a few ut-terances in the corpus with these dialogue acts, and most of them are classified asstatements.The results of semi-supervised dialogue act recognition in Table 5.10 demon-strate that we need a more sophisticated semi-supervised approach for improvingthe performance of supervised dialogue act modeling by adding unlabeled data. Asit was suggested earlier, there are some criteria that need to be taken into consider-ation to exploit unlabeled data effectively.61Chapter 6Conclusion and Future WorkWe have studied the effectiveness of sophisticated machine learning algorithms fordialogue act modeling across a comprehensive set of different spoken and writtenconversations. Through an extensive experiment, we have shown that our proposedSVM-hmm algorithm can achieve promising results on different synchronous andasynchronous conversations, and outperforms our baseline and other approachessignificantly. Moreover, we have exploited a domain-independent feature set thatis similarly effective in spoken and written conversations.In addition, we investigated the effectiveness of unsupervised and semi-superviseddialogue act recognition methods over different challenging conversational datasets.We also conducted several experiments for dialogue act modeling with coarse-grained dialogue acts and studying the effectiveness of bigrams for dialogue actrecognition.For future work, there remain a number of modifications and extensions thatcan be made to the studied methods:? For further improvement in our supervised framework, we can incorporateother lexical (e.g., trigrams) and syntactic (e.g., POS tags) features.? We can also augment our feature set with domain-specific features like prosodicfeatures for spoken conversations to investigate the effect of domain-specificfeatures in addition to domain-independent ones.? Two stage classification is another way of improving the performance. Wecan classify the dialogue acts into two groups of questions and non-questions,and in the second level we can detect more fine-grained labels.? In order to take advantage of the large amount of data generated daily inconversations, we can improve our semi-supervised approach. One possibledirection is developing a more sophisticated semi-supervised algorithm.? Another idea for improving our semi-supervised approach is to define a dif-ferent confidence score such as the posterior probability to filter more noise,and improve the quality of training data.62Bibliography[1] Steven Abney. Bootstrapping. Proceedings of the 40th Annual Meeting of theAssociation for Computational Linguistics., 2002.[2] Yasemin Altun, Ioannis Tsochantaridis, and Thomas Hofmann. HiddenMarkov Support Vector Machines. Proceedings of the 20th InternationalConference on Machine Learning, 2003.[3] John L. Austin. How to Do Things with Words. Harvard University Press,1962.[4] Susan Bartlett, Grzegorz Kondrak, and Colin Cherry. Automatic Syllabifica-tion with Structured SVMs for Letter-To-Phoneme Conversion. 46th AnnualMeeting of the Association for Computational Linguistics: Human LanguageTechnologies (ACL-08: HLT), 2008.[5] Peter F. Brown, Vincent J. Della Pietra, Robert L. Mercer, Stephen A. DellaPietra, and Jennifer C. Lai. An estimate of an upper bound for the entropy ofEnglish. MIT Press, 1992.[6] Giuseppe Carenini, Gabriel Murray, and Raymond Ng. Methods for Miningand Summarizing Text Conversations. Morgan & Claypool Publishers, 2011.[7] Giuseppe Carenini, Raymond T. Ng, and Xiaodong Zhou. Summarizingemails with conversational cohesion and subjectivity. ACL-08, 2008.[8] Jean Carletta. Unleashing the killer corpus: experiences in creating the multi-everything AMI Meeting Corpus. Language Resources and Evaluation Jour-nal, 41(2):181?190, 2007.[9] Lari Carlson. Dialogue Games: An Approach to Discourse Analysis. D.Reidel, 1983.[10] Vitor R. Carvalho and William W. Cohen. On the collective classification ofemail "speech acts". Proceedings of the 31st Annual Int. ACM SIGIR Conf.on Research and Development in Information Retrieval, 2005.63Bibliography[11] William W. Cohen, Vitor R. Carvalho, and Tom M. Mitchell. Learning toclassify email into "speech acts". In Proc. 2004 Conf. Empirical Methods inNatural Language Processing, EMNLP-2004, 2004.[12] Koby Crammer and Yoram Singer. On the algorithmic implementation ofmulticlass kernel-based vector machines. Journal of Machine Learning Re-search, 2001.[13] Oliver Ferschke, Iryna Gurevych, and Yevgen Chebotar. Behind the Article:Recognizing Dialog Acts in Wikipedia Talk Pages. Proceedings of the 13thConference of the European Chapter of the ACL, 2012.[14] Hal Daume III. Frustratingly Easy Domain Adaptation. Proceedings of the45th Annual Meeting of the Association for Computational Linguistics, 2007.[15] Minwoo Jeong, Chin-Yew Lin, and Gary G. Lee. The Semi-supervisedspeech act recognition in emails and forums. Proceedings of the 2009 Conf.Empirical Methods in Natural Language Processing, 2009.[16] Thorsten Joachims. SVMmulticlass Multi-Class Support Vector Machine.2008a.[17] Thorsten Joachims. SVMhmm Sequence tagging with structural support vec-tor machines. 2008b.[18] Shafiq R. Joty, Giuseppe Carenini, and Chin-Yew Lin. Unsupervised model-ing of dialog acts in asynchronous conversations. IJCAI, 2011.[19] Dan Jurafsky, Elizabeth Shriberg, and Debra Biasca. Switchboard SWBD-DAMSL labeling project coder?s manual, draft 13. Technical report, Univ. ofColorado Institute of Cognitive Science, 1997.[20] Daniel Jurafsky and James H. Martin. Speech and Language Processing.Prentice-Hall, 2008.[21] S. S. Keerthi and S. Sundararajan. CRF versus SVM-Struct for sequencelabeling. Technical report, Yahoo Research, 2007.[22] Su N. Kim, Lawrence Cavedon, and Timothy Baldwin. Classifying dialogueacts in one-on-one live chats. EMNLP?10, 2010a.[23] Su N. Kim, Li Wang, and Timothy Baldwin. Tagging and linking web forumposts. Proceedings of the Fourteenth Conference on Computational NaturalLanguage Learning, CoNLL ?10, 2010b.64Bibliography[24] John Lafferty, Andrew McCallum, and Fernando Pereira. Conditional ran-dom fields: Probabilistic models for segmenting and labeling sequence data.Intl. Conf. on Machine Learning, 2001.[25] Joan A. Levin and Johanna A. Moore. Dialogue games: Metacommunicationstructures for natural language interaction. Cognitive Science, 1977.[26] Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze. Intro-duction to Information Retrieval. Cambridge University Press, 2008.[27] Andrew K. McCallum. MALLET: A Machine Learning for LanguageToolkit. 2002.[28] Marina Meila. Comparing clusterings by the variation of information. Learn-ing Theory and Kernel Machines, 2003.[29] Gabriel Murray, Giuseppe Carenini, and Raymond T. Ng. Generating andvalidating abstracts of meeting conversations: a user study. INLG?10, 2010.[30] Gabriel Murray, Steve Renals, Jean Carletta, and Johanna Moore. Incor-porating speaker and discourse features into speech summarization. HLT-NAACL?06, 2006.[31] Michael J. Paul. Mixed Membership Markov Models for Unsupervised Con-versations Modeling. The 2012 Conference on Empirical Methods in Nat-ural Language Processing and Computational Natural Language Learning(EMNLP-CoNLL 2012), 2012.[32] Rajesh Ranganath, Dan Jurafsky, and Dan Mcfarland. Its not you, its me:Detecting flirting and its misperception in speed-dates. EMNLP-09, 2009.[33] Sujith Ravi and Jihie Kim. Profiling student interactions in threaded discus-sions with speech act classifiers. AIED?07, 2007.[34] Alan Ritter, Colin Cherry, , and Bill Dolan. Unsupervised modeling of twit-ter conversations. Proceedings of the 11th Annual Meeting North AmericanAssoc. for Computational Linguistics, NAACL, 2010.[35] John R. Searle. A taxonomy of illocutionary acts. In Gun- derson, K. (Ed.),Language, Mind and Knowledge, Minnesota Studies in the Philosophy of Sci-ence, 1975.[36] Fabrizio Sebastiani. Machine Learning in Automated Text Categorization.ACM Computing Surveys, 2002.65Bibliography[37] Lokesh Shrestha and KathleenMcKeown. Detection of question-answer pairsin email conversations. Proceedings of the 20th Biennial Int. Conf. on Com-putational Linguistics, 2004.[38] Elizabeth Shriberg, Raj Dhillon, Sonali Bhagat, Jeremy Ang, and HannahCarvey. The ICSI Meeting Recorder Dialog Act (MRDA) Corpus. HLT-NAACL SIGDIAL Workshop, 2004.[39] Congkai Sun and Louise-Philippe Morency. Dialogue Act Recognition usingReweighted Speaker Adaptation. 13th Annual SIGdial Meeting on Discourseand Dialogue, 2012.[40] Maryam Tavafi, Yashar Mehdad, Shafiq Joty, Giuseppe Carenini, and Ray-mond Ng. Dialogue Act Recognition in Synchronous and AsynchronousConversations. Proceedings of the 14th Annual SIGdial Meeting on Dis-course and Dialogue (SIGDIAL?13), 2013.[41] David R. Traum and Elizabeth A. Hinkelman. Conversation acts in task-oriented spoken dialogue. Computational Intelligence: Special Issue on Com-putational Approaches to Non-Literal Language, 1992.[42] Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, and YaseminAltun. Support vector machine learning for interdependent and structuredoutput spaces. Proceedings of the 21st International Conference on MachineLearning (ICML), 2004.[43] Ioannis Tsochantaridis, Thomas Hofmann, Thorsten Joachims, and YaseminAltun. Coherence and flexibility in dialogue games for argumentation. Jour-nal of Logic and Computation, 2005.[44] Grigorios Tsoumakas, Ioannis Katakis, and Ioannis Vlahavas. Mining Multi-label Data. Data mining and knowledge discovery handbook, 2010.[45] Jan Ulrich, Gabriel Murray, and Giuseppe Carenini. A publicly availableannotated corpus for supervised email summarization. EMAIL?08 Workshop.AAAI, 2008.[46] Cornelis Joost van Rijsbergen. Information Retrieval. Butterworths, 1975.[47] Van Asch Vincent. Macro- and micro-averaged evaluation measures. 2012.[48] Eric Xing. Hidden Markov Model and Conditional Random Fields. CarnegieMellon, 2007.66Bibliography[49] Renxian Zhang, Dehong Gao, and Wenjie Li. Towards Scalable Speech ActRecognition in Twitter: Tackling Insufficient Training Data. EACL 2012Workshop on Semantic Analysis in Social Networks, 2012.67

Cite

Citation Scheme:

        

Citations by CSL (citeproc-js)

Usage Statistics

Share

Embed

Customize your widget with the following options, then copy and paste the code below into the HTML of your page to embed this item in your website.
                        
                            <div id="ubcOpenCollectionsWidgetDisplay">
                            <script id="ubcOpenCollectionsWidget"
                            src="{[{embed.src}]}"
                            data-item="{[{embed.item}]}"
                            data-collection="{[{embed.collection}]}"
                            data-metadata="{[{embed.showMetadata}]}"
                            data-width="{[{embed.width}]}"
                            async >
                            </script>
                            </div>
                        
                    
IIIF logo Our image viewer uses the IIIF 2.0 standard. To load this item in other compatible viewers, use this url:
http://iiif.library.ubc.ca/presentation/dsp.24.1-0052186/manifest

Comment

Related Items