UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Object categorization and localization with spatially localized features McCann, Sancho


Object classification and localization are important components of image understanding. For a computer to interact with our world, it will need to identify the objects in our world. At a more basic level, these tasks are crucial to many practical applications: image organization, visual search, autonomous vehicles, and surveillance. This thesis presents alternatives to the currently popular approaches to object classification and localization, specifically focusing on methods that more tightly integrate location information with the visual features. We start by improving on Naive Bayes Nearest Neighbor (NBNN), an alternative to the standard bag-of-words/spatial pyramid classification pipeline. This model matches localized features between a test image and the entire training set in order to classify an image as belonging to one of several categories. We improve this method’s classification performance and algorithmic complexity. However, the nature of NBNN results in prohibitive memory requirements in large datasets. This leads to our second contribution: a bag-of-words model based on a clustering of the location-augmented features. This a simple and more flexible approach to modeling location information than the commonly used spatial pyramid. By using location-augmented features, location information is captured simply in the nearest-neighbor coding of the bag-of-words model. This results in a more efficient use of model dimensions than the spatial pyramid and higher classification performance than state-of-the-art alternatives. Last, we present the design of an object localization system using this high performance classifier. Such design is made more difficult by the fact that our model does not satisfy the assumptions made by recent efficient localization algorithms. Our method uses a Hough transform based on an approximation to our model, followed by a more accurate refinement of classifier scores and bounding boxes. We show its effectiveness on the widely used and practical Daimler monocular pedestrian dataset. These contributions show that simple, location-augmented features, soft-nearest neighbor coding, and linear Support Vector Machines (SVMS) can outperform the long-used and optimized spatial pyramid methods and that this approach warrants additional research to continue to improve its efficiency.

Item Media

Item Citations and Data


Attribution 2.5 Canada