Mining unstructured social streams : cohesion, context and evolution

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Mining unstructured social streams : cohesion, context and evolution Li, Pei

Abstract

As social websites like Twitter greatly influence people's digital life, unstructured social streams become prevalent, which are fast surging textual post streams without formal structure or schema between posts or inside the post content. Modeling and mining unstructured social streams in Twitter become a challenging and fundamental problem in social web analysis, which leads to numerous applications, e.g., recommending social feeds like "what's happening right now?" or "what are related stories?". Current social stream analysis in response to queries merely return an overwhelming list of posts, with little aggregation or semantics. The design of the next generation social stream mining algorithms faces various challenges, especially, the effective organization of meaningful information from noisy, unstructured, and streaming social content. The goal of this dissertation is to address the most critical challenges in social stream mining using graph-based techniques. We model a social stream as a post network, and use "event" and "story" to capture a group of aggregated social posts presenting similar content in different granularities, where an event may contain a series of stories. We highlight our contributions on social stream mining from a structural perspective as follows. We first model a story as a quasi-clique, which is cohesion-persistent regardless of the story size, and propose two solutions, DIM and SUM, to search the largest story containing given query posts, by deterministic and stochastic means, respectively. To detect all stories in the time window of a social stream and support the context-aware story-telling, we propose CAST, which defines a story as a (k,d)-Core in post network and tracks the relatedness between stories. We propose Incremental Cluster Evolution Tracking (ICET), which is an incremental computation framework for event evolution on evolving post networks, with the ability to track evolution patterns of social events as time rolls on. Approaches in this dissertation are based on two hypotheses: users prefer correlated posts to individual posts in post stream modeling, and a structural approach is better than frequency/LDA-based approaches in event and story modeling. We verify these hypotheses by crowdsourcing based user studies.

Item Metadata

Title	Mining unstructured social streams : cohesion, context and evolution
Creator	Li, Pei
Publisher	University of British Columbia
Date Issued	2017
Description	As social websites like Twitter greatly influence people's digital life, unstructured social streams become prevalent, which are fast surging textual post streams without formal structure or schema between posts or inside the post content. Modeling and mining unstructured social streams in Twitter become a challenging and fundamental problem in social web analysis, which leads to numerous applications, e.g., recommending social feeds like "what's happening right now?" or "what are related stories?". Current social stream analysis in response to queries merely return an overwhelming list of posts, with little aggregation or semantics. The design of the next generation social stream mining algorithms faces various challenges, especially, the effective organization of meaningful information from noisy, unstructured, and streaming social content. The goal of this dissertation is to address the most critical challenges in social stream mining using graph-based techniques. We model a social stream as a post network, and use "event" and "story" to capture a group of aggregated social posts presenting similar content in different granularities, where an event may contain a series of stories. We highlight our contributions on social stream mining from a structural perspective as follows. We first model a story as a quasi-clique, which is cohesion-persistent regardless of the story size, and propose two solutions, DIM and SUM, to search the largest story containing given query posts, by deterministic and stochastic means, respectively. To detect all stories in the time window of a social stream and support the context-aware story-telling, we propose CAST, which defines a story as a (k,d)-Core in post network and tracks the relatedness between stories. We propose Incremental Cluster Evolution Tracking (ICET), which is an incremental computation framework for event evolution on evolving post networks, with the ability to track evolution patterns of social events as time rolls on. Approaches in this dissertation are based on two hypotheses: users prefer correlated posts to individual posts in post stream modeling, and a structural approach is better than frequency/LDA-based approaches in event and story modeling. We verify these hypotheses by crowdsourcing based user studies.
Genre	Thesis/Dissertation
Type	Text
Language	eng
Date Available	2017-03-23
Provider	Vancouver : University of British Columbia Library
Rights	Attribution-NonCommercial-NoDerivatives 4.0 International
DOI	10.14288/1.0343307
URI	http://hdl.handle.net/2429/60982
Degree (Theses)	Doctor of Philosophy - PhD
Program (Theses)	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2017-05
Campus	UBCV
Scholarly Level	Graduate
Rights URI	http://creativecommons.org/licenses/by-nc-nd/4.0/
Aggregated Source Repository	DSpace

Open Collections

UBC Theses and Dissertations

UBC Theses and Dissertations

Mining unstructured social streams : cohesion, context and evolution Li, Pei

Abstract

Item Metadata

Item Media

Item Citations and Data

Rights