UBC Theses and Dissertations
Mining unstructured social streams : cohesion, context and evolution Li, Pei
As social websites like Twitter greatly influence people's digital life, unstructured social streams become prevalent, which are fast surging textual post streams without formal structure or schema between posts or inside the post content. Modeling and mining unstructured social streams in Twitter become a challenging and fundamental problem in social web analysis, which leads to numerous applications, e.g., recommending social feeds like "what's happening right now?" or "what are related stories?". Current social stream analysis in response to queries merely return an overwhelming list of posts, with little aggregation or semantics. The design of the next generation social stream mining algorithms faces various challenges, especially, the effective organization of meaningful information from noisy, unstructured, and streaming social content. The goal of this dissertation is to address the most critical challenges in social stream mining using graph-based techniques. We model a social stream as a post network, and use "event" and "story" to capture a group of aggregated social posts presenting similar content in different granularities, where an event may contain a series of stories. We highlight our contributions on social stream mining from a structural perspective as follows. We first model a story as a quasi-clique, which is cohesion-persistent regardless of the story size, and propose two solutions, DIM and SUM, to search the largest story containing given query posts, by deterministic and stochastic means, respectively. To detect all stories in the time window of a social stream and support the context-aware story-telling, we propose CAST, which defines a story as a (k,d)-Core in post network and tracks the relatedness between stories. We propose Incremental Cluster Evolution Tracking (ICET), which is an incremental computation framework for event evolution on evolving post networks, with the ability to track evolution patterns of social events as time rolls on. Approaches in this dissertation are based on two hypotheses: users prefer correlated posts to individual posts in post stream modeling, and a structural approach is better than frequency/LDA-based approaches in event and story modeling. We verify these hypotheses by crowdsourcing based user studies.
Item Citations and Data
Attribution-NonCommercial-NoDerivatives 4.0 International