UBC Theses and Dissertations

UBC Theses Logo

UBC Theses and Dissertations

Symmetric implicit geometries under label-imbalances : from cross-entropy to supervised contrastive loss Vakilian, Vala

Abstract

Class imbalance in machine learning has long been a significant research challenge. As models grow, reaching interpolation capacity (where models fully fit the training data), traditional methods like loss reweighting or data resampling become less effective. While new methods address imbalanced learning in the over-parameterized regime, their impact on deep learning is unclear. Recent studies reveal the Neural Collapse (NC) phenomenon in over-parameterized networks, where features from penultimate layers collapse into class-representative vectors. These vectors align with last layer classifiers, forming symmetric geometry with maximal class separation. Initially observed experimentally, NC has been analyzed theoretically using the Unconstrained Feature Model (UFM), focusing on the last layer and assuming networks produce optimized outputs, mainly on balanced datasets. In this thesis, we extend the UFM framework to analyze Cross-Entropy (CE) and Supervised Contrastive Loss (SCL) in imbalanced data. We introduce the SELI framework to characterize classifier/feature geometry under CE loss, explaining symmetric geometry on balanced datasets and predicting geometry in imbalanced cases. This leads to our identification of the Minority Collapse phenomenon, where minority class classifiers collapse, reducing margins and classification performance. We explore logit-adjusted CE loss variations that improve accuracy by favoring minority margins. We study resulting geometries and propose optimal hyperparameters for achieving symmetric geometries, enhancing minority margins. For SCL loss, adding a non-negative activation function like ReLU restores symmetry in geometry regardless of imbalance. Our analysis reveals batching's importance in contrasting features, leading to minimum requirements for NC feature collapse. We suggest batching strategies aiding SCL convergence to symmetric geometry and propose using prototype vectors to fine-tune geometry in SCL models without non-negativity constraints. Our findings are supported by extensive experiments on standardized vision datasets.

Item Citations and Data

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International