- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Decision theoretic learning of human facial displays...
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Decision theoretic learning of human facial displays and gestures Hoey, Jesse
Abstract
We present a vision-based, adaptive, decision-theoretic model of human facial displays and gestures in interaction. Changes in the human face occur due to many factors, including communication, emotion, speech, and physiology. Most systems for facial expression analysis attempt to recognize one or more of these factors, resulting in a machine whose inputs are video sequences or static images, and whose outputs are, for example, basic emotion categories. Our approach is fundamentally different. We make no prior commitment to some particular recognition task. Instead, we consider that the meaning of a facial display for an observer is contained in its relationship to actions and outcomes. Agents must distinguish facial displays according to their affordances, or how they help an agent to maximize utility. To this end, our system learns relationships between the movements of a person's face, the context in which they are acting, and a utility function. The model is a partially observable Markov decision process, or POMDP. The video observations are integrated into the POMDP using a dynamic Bayesian network, which creates spatial and temoral abstractions amenable to decision making at the high level. The parameters of the model are learned from training data using an a-posteriori constrained optimization technique based on the expectation-maximization algorithm. The training does not require labeled data, since we do not train classifiers for individual facial actions, and then integrate them into the model. Rather, the learning process discovers clusters of facial motions and their relationship to the context automatically. As such, it can be applied to any situation in which non-verbal gestures are purposefully used in a task. We present an experimental paradigm in which we record two humans playing a collaborative game, or a single human playing against an automated agent, and learn the human behaviors. We use the resulting model to predict human actions. We show results on three simple games.
Item Metadata
Title |
Decision theoretic learning of human facial displays and gestures
|
Creator | |
Publisher |
University of British Columbia
|
Date Issued |
2003
|
Description |
We present a vision-based, adaptive, decision-theoretic model of human facial displays and gestures in interaction. Changes in the human face occur due to many factors, including communication, emotion, speech, and physiology. Most systems for facial expression analysis attempt to recognize one or more of these factors, resulting in a machine whose inputs are video sequences or static images, and whose outputs are, for example, basic emotion categories. Our approach is fundamentally different. We make no prior commitment to some particular recognition task. Instead, we consider that the meaning of a facial display for an observer is contained in its relationship to actions and outcomes. Agents must distinguish facial displays according to their affordances, or how they help an agent to maximize utility. To this end, our system learns relationships between the movements of a person's face, the context in which they are acting, and a utility function. The model is a partially observable Markov decision process, or POMDP. The video observations are integrated into the POMDP using a dynamic Bayesian network, which creates spatial and temoral abstractions amenable to decision making at the high level. The parameters of the model are learned from training data using an a-posteriori constrained optimization technique based on the expectation-maximization algorithm. The training does not require labeled data, since we do not train classifiers for individual facial actions, and then integrate them into the model. Rather, the learning process discovers clusters of facial motions and their relationship to the context automatically. As such, it can be applied to any situation in which non-verbal gestures are purposefully used in a task. We present an experimental paradigm in which we record two humans playing a collaborative game, or a single human playing against an automated agent, and learn the human behaviors. We use the resulting model to predict human actions. We show results on three simple games.
|
Extent |
30711653 bytes
|
Genre | |
Type | |
File Format |
application/pdf
|
Language |
eng
|
Date Available |
2009-11-27
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
|
DOI |
10.14288/1.0051546
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2003-11
|
Campus | |
Scholarly Level |
Graduate
|
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.