Unsupervised statistical models for general object recognition

UBC Theses and Dissertations

Featured Collection

UBC Theses and Dissertations

Unsupervised statistical models for general object recognition Carbonetto, Peter

Abstract

We approach the object recognition problem as the process of attaching meaningful labels to specific regions of an image. Given a set of images and their captions, we segment the images, in either a crude or sophisticated fashion, then learn the proper associations between words and regions. Previous models are limited by the scope of the representation, and performance is constrained by noise from poor initial clusterings of the image features. We propose three improvements that address these issues. First, we describe a model that incorporates clustering into the learning step using a basic mixture model. Second, we propose Bayesian priors on the mixture model to stabilise learning and automatically weight features. Third, we develop a more expressive model that learns spatial relations between regions of a scene. Using the analogy of building a lexicon via an aligned bitext, we formulate a probabilistic mapping between the image feature vectors and the supplied word tokens. To find the best hypothesis, we hill-climb the log-posterior using the EM algorithm. Spatial context introduces cycles to our probabilistic graphical model, so we use loopy belief propagation to compute the expectation of the complete log-posterior, and iterative scaling and iterative proportional fitting on the pseudo-likelihood approximation to render parameter estimation tractable. The EM algorithm is no longer guaranteed to converge with an intractable posterior, but experiments show the approximate E and M Steps consistently converge to a local solution. Empirical results on a diverse array of images show that learning image feature clusters using a standard mixture model, feature weighting using Bayesian shrinkage priors and spatial context potentials considerably improve the accuracy of general object recognition. Moreover, results suggest that good performance can be obtained without expensive image segmentations.

Item Metadata

Title	Unsupervised statistical models for general object recognition
Creator	Carbonetto, Peter
Publisher	University of British Columbia
Date Issued	2003
Description	We approach the object recognition problem as the process of attaching meaningful labels to specific regions of an image. Given a set of images and their captions, we segment the images, in either a crude or sophisticated fashion, then learn the proper associations between words and regions. Previous models are limited by the scope of the representation, and performance is constrained by noise from poor initial clusterings of the image features. We propose three improvements that address these issues. First, we describe a model that incorporates clustering into the learning step using a basic mixture model. Second, we propose Bayesian priors on the mixture model to stabilise learning and automatically weight features. Third, we develop a more expressive model that learns spatial relations between regions of a scene. Using the analogy of building a lexicon via an aligned bitext, we formulate a probabilistic mapping between the image feature vectors and the supplied word tokens. To find the best hypothesis, we hill-climb the log-posterior using the EM algorithm. Spatial context introduces cycles to our probabilistic graphical model, so we use loopy belief propagation to compute the expectation of the complete log-posterior, and iterative scaling and iterative proportional fitting on the pseudo-likelihood approximation to render parameter estimation tractable. The EM algorithm is no longer guaranteed to converge with an intractable posterior, but experiments show the approximate E and M Steps consistently converge to a local solution. Empirical results on a diverse array of images show that learning image feature clusters using a standard mixture model, feature weighting using Bayesian shrinkage priors and spatial context potentials considerably improve the accuracy of general object recognition. Moreover, results suggest that good performance can be obtained without expensive image segmentations.
Extent	4034738 bytes
Genre	Thesis/Dissertation
Type	Text
File Format	application/pdf
Language	eng
Date Available	2009-11-02
Provider	Vancouver : University of British Columbia Library
Rights	For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.
DOI	10.14288/1.0051532
URI	http://hdl.handle.net/2429/14543
Degree	Master of Science - MSc
Program	Computer Science
Affiliation	Science, Faculty of; Computer Science, Department of
Degree Grantor	University of British Columbia
Graduation Date	2003-11
Campus	UBCV
Scholarly Level	Graduate
Aggregated Source Repository	DSpace

Item Media

ubc_2003-0619.pdf -- 3.85MB

Item Citations and Data

Rights

For non-commercial purposes only, such as research, private study and education. Additional conditions apply, see Terms of Use https://open.library.ubc.ca/terms_of_use.