- Library Home /
- Search Collections /
- Open Collections /
- Browse Collections /
- UBC Theses and Dissertations /
- Global optimization of clustering problems
Open Collections
UBC Theses and Dissertations
UBC Theses and Dissertations
Global optimization of clustering problems Shi, Mingfei
Abstract
Clustering is a fundamental unsupervised machine learning task that aims to aggregate similar data into one cluster and separate those in diverse into different clusters. Cluster analysis can always be formulated as an optimization problem. Various objective functions may lead to different clustering problems. In this thesis, we concentrate on k-means and k-center problems. Each can be formulated as a mixed-integer nonlinear programming problem. The work about k-means clustering optimization has been published on ICML 2021 [30]. Moreover, we also submitted the work about global optimization of k-center clustering to ICML 2022 and the paper has been accepted in Phase 1 of reviewing. This thesis provides a practical global optimization algorithm for these two tasks based on a reduced-space spatial branch and bound (BB) scheme. This algorithm can guarantee convergence to the global optimum by only branching on the centers of clusters, which is independent of the dataset’s cardinality. We also design several methods to construct lower and upper bounds at each node in the BB scheme. In addition, for k-center problem, a set of feasibility-based bounds tightening techniques are proposed to narrow down the domain of centers and significantly accelerate the convergence. To demonstrate the capacity of this algorithm, we present computational results on UCI datasets and compare our proposed algorithms with the off-the-shelf global optimal solvers and classical local optimal algorithms. For k-means clustering, the numerical experiments demonstrated our algorithm’s ability to handle datasets with up to 200,000 samples. Besides, for k-center clustering, the serial implementation of the algorithm on the dataset with 14 million samples and 3 features can attain the global optimum to an optimality gap of 0.1% within 2 hours.
Item Metadata
Title |
Global optimization of clustering problems
|
Creator | |
Supervisor | |
Publisher |
University of British Columbia
|
Date Issued |
2022
|
Description |
Clustering is a fundamental unsupervised machine learning task that aims to aggregate similar data into one cluster and separate those in diverse into different clusters. Cluster analysis can always be formulated as an optimization problem. Various objective functions may lead to different clustering problems. In this thesis, we concentrate on k-means and k-center problems. Each can be formulated as a mixed-integer nonlinear programming problem. The work about k-means clustering optimization has been published on ICML 2021 [30]. Moreover, we also submitted the work about global optimization of k-center clustering to ICML 2022 and the paper has been accepted in Phase 1 of reviewing. This thesis provides a practical global optimization algorithm for these two tasks based on a reduced-space spatial branch and bound (BB) scheme. This algorithm can guarantee convergence to the global optimum by only branching on the centers of clusters, which is independent of the dataset’s cardinality. We also design several methods to construct lower and upper bounds at each node in the BB scheme. In addition, for k-center problem, a set of feasibility-based bounds tightening techniques are proposed to narrow down the domain of centers and significantly accelerate the convergence. To demonstrate the capacity of this algorithm, we present computational results on UCI datasets and compare our proposed algorithms with the off-the-shelf global optimal solvers and classical local optimal algorithms. For k-means clustering, the numerical experiments demonstrated our algorithm’s ability to handle datasets with up to 200,000 samples. Besides, for k-center clustering, the serial implementation of the algorithm on the dataset with 14 million samples and 3 features can attain the global optimum to an optimality gap of 0.1% within 2 hours.
|
Genre | |
Type | |
Language |
eng
|
Date Available |
2022-04-25
|
Provider |
Vancouver : University of British Columbia Library
|
Rights |
Attribution-NonCommercial-NoDerivatives 4.0 International
|
DOI |
10.14288/1.0413021
|
URI | |
Degree | |
Program | |
Affiliation | |
Degree Grantor |
University of British Columbia
|
Graduation Date |
2022-05
|
Campus | |
Scholarly Level |
Graduate
|
Rights URI | |
Aggregated Source Repository |
DSpace
|
Item Media
Item Citations and Data
Rights
Attribution-NonCommercial-NoDerivatives 4.0 International