Discourse analysis of asynchronous conversations

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Discourse analysis of asynchronous conversations Joty, Shafiq Rayhan

Abstract

A well-written text is not merely a sequence of independent and isolated sentences, but instead a sequence of structured and related sentences. It addresses a particular topic, often covering multiple subtopics, and is organized in a coherent way that enables the reader to process the information. Discourse analysis seeks to uncover such underlying structures, which can support many applications including text summarization and information extraction. This thesis focuses on building novel computational models of different discourse analysis tasks in asynchronous conversations; i.e., conversations where participants communicate with each other at different times (e.g., emails, blogs). Effective processing of these conversations can be of great strategic value for both organizations and individuals. We propose novel computational models for topic segmentation and labeling, rhetorical parsing and dialog act recognition in asynchronous conversation. Our approaches rely on two related computational methodologies: graph theory and probabilistic graphical models. The topic segmentation and labeling models find the high-level discourse structure; i.e., the global topical structure of an asynchronous conversation. Our graph-based approach extends state-of-the-art methods by integrating a fine-grained conversational structure with other conversational features. On the other hand, the rhetorical parser captures the coherence structure, a finer discourse structure, by identifying coherence relations between the discourse units within each comment of the conversation. Our parser applies an optimal parsing algorithm to probabilities inferred from a discriminative graphical model which allows us to represent the structure and the label of a discourse tree constituent jointly, and to capture the sequential and hierarchical dependencies between the constituents. Finally, the dialog act model allows us to uncover the underlying dialog structure of the conversation. We present unsupervised probabilistic graphical models that capture the sequential dependencies between the acts, and show how these models can be trained more effectively based on the fine-grained conversational structure. Together, these structures provide a deep understanding of an asynchronous conversation that can be exploited in the above-mentioned applications. For each discourse processing task, we evaluate our approach on different datasets, and show that our models consistently outperform the state-of-the-art by a wide margin. Often our results are highly correlated with human annotations.

Item Metadata

Title	Discourse analysis of asynchronous conversations
Creator	Joty, Shafiq Rayhan
Publisher	University of British Columbia
Date Issued	2013
Description	A well-written text is not merely a sequence of independent and isolated sentences, but instead a sequence of structured and related sentences. It addresses a particular topic, often covering multiple subtopics, and is organized in a coherent way that enables the reader to process the information. Discourse analysis seeks to uncover such underlying structures, which can support many applications including text summarization and information extraction. This thesis focuses on building novel computational models of different discourse analysis tasks in asynchronous conversations; i.e., conversations where participants communicate with each other at different times (e.g., emails, blogs). Effective processing of these conversations can be of great strategic value for both organizations and individuals. We propose novel computational models for topic segmentation and labeling, rhetorical parsing and dialog act recognition in asynchronous conversation. Our approaches rely on two related computational methodologies: graph theory and probabilistic graphical models. The topic segmentation and labeling models find the high-level discourse structure; i.e., the global topical structure of an asynchronous conversation. Our graph-based approach extends state-of-the-art methods by integrating a fine-grained conversational structure with other conversational features. On the other hand, the rhetorical parser captures the coherence structure, a finer discourse structure, by identifying coherence relations between the discourse units within each comment of the conversation. Our parser applies an optimal parsing algorithm to probabilities inferred from a discriminative graphical model which allows us to represent the structure and the label of a discourse tree constituent jointly, and to capture the sequential and hierarchical dependencies between the constituents. Finally, the dialog act model allows us to uncover the underlying dialog structure of the conversation. We present unsupervised probabilistic graphical models that capture the sequential dependencies between the acts, and show how these models can be trained more effectively based on the fine-grained conversational structure. Together, these structures provide a deep understanding of an asynchronous conversation that can be exploited in the above-mentioned applications. For each discourse processing task, we evaluate our approach on different datasets, and show that our models consistently outperform the state-of-the-art by a wide margin. Often our results are highly correlated with human annotations.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2014-01-02
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0165726
URI	http://hdl.handle.net/2429/45674
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2014-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Discourse analysis of asynchronous conversations Joty, Shafiq Rayhan

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights