UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Discourse analysis of asynchronous conversations Joty, Shafiq Rayhan


A well-written text is not merely a sequence of independent and isolated sentences, but instead a sequence of structured and related sentences. It addresses a particular topic, often covering multiple subtopics, and is organized in a coherent way that enables the reader to process the information. Discourse analysis seeks to uncover such underlying structures, which can support many applications including text summarization and information extraction. This thesis focuses on building novel computational models of different discourse analysis tasks in asynchronous conversations; i.e., conversations where participants communicate with each other at different times (e.g., emails, blogs). Effective processing of these conversations can be of great strategic value for both organizations and individuals. We propose novel computational models for topic segmentation and labeling, rhetorical parsing and dialog act recognition in asynchronous conversation. Our approaches rely on two related computational methodologies: graph theory and probabilistic graphical models. The topic segmentation and labeling models find the high-level discourse structure; i.e., the global topical structure of an asynchronous conversation. Our graph-based approach extends state-of-the-art methods by integrating a fine-grained conversational structure with other conversational features. On the other hand, the rhetorical parser captures the coherence structure, a finer discourse structure, by identifying coherence relations between the discourse units within each comment of the conversation. Our parser applies an optimal parsing algorithm to probabilities inferred from a discriminative graphical model which allows us to represent the structure and the label of a discourse tree constituent jointly, and to capture the sequential and hierarchical dependencies between the constituents. Finally, the dialog act model allows us to uncover the underlying dialog structure of the conversation. We present unsupervised probabilistic graphical models that capture the sequential dependencies between the acts, and show how these models can be trained more effectively based on the fine-grained conversational structure. Together, these structures provide a deep understanding of an asynchronous conversation that can be exploited in the above-mentioned applications. For each discourse processing task, we evaluate our approach on different datasets, and show that our models consistently outperform the state-of-the-art by a wide margin. Often our results are highly correlated with human annotations.

Item Media

Item Citations and Data


Attribution-NonCommercial-NoDerivatives 4.0 International