LCSL: Long-tailed Classification via Self-labeling

Duc Quang Vu, Trang T.T. Phung, Jia Ching Wang*, Son T. Mai

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

During the last decades, deep learning (DL) has been proven to be a very powerful and successful technique in many real-world applications, e.g., video surveillance or object detection. However, when class label distributions are highly skewed, DL classifiers tend to be biased towards majority classes during training phases. This leads to poor generalization of minority classes and consequently reduces the overall accuracy. How to effectively deal with this long-tailed class distribution in DL, i.e., deep long-tailed classification (DLC), remains a challenging problem despite many research efforts. Among various approaches, data augmentation, which aims at generating more samples for reducing label imbalance, is the most common and practical one. However, simply relying on existing class-agnostic augmentation strategies without properly considering the label differences would worsen the problem since more head-class samples can be inevitably augmented than tail-class ones. Moreover, none of the existing works consider the quality and suitability of augmented samples during the training process. Our proposed approach, called Long-tailed Classification via Self-Labeling (LCSL), is specifically designed to address these limitations. LCSL fundamentally differs from existing works by the way it iteratively exploits the preceding network during the training process to re-label the labeled augmented samples and uses the output confidence to decide whether new samples belong to minority classes before adding them to the data. Not only does this help to reduce imbalance ratios among classes, but this also helps to reduce the uncertainty of class prediction problems for minority classes by selecting more confident samples to the data. This incremental learning and generating scheme thus provide a new robust approach for decreasing model over-fitting, thus enhancing the overall accuracy, especially for minority classes. Extensive experiments have demonstrated that LCSL acquires better performance than state-of-the-art long-tailed learning techniques on various standard benchmark datasets. More specifically, our LCSL obtains 85.8%, 54.4%, and 56.2% in terms of accuracy on CIFAR10-LT, CIFAR100-LT, and ImageNet-LT (with moderate to extreme imbalance ratios), respectively. The source code is available at https://github.com/vdquang1991/lcsl/.

Original languageEnglish
JournalIEEE Transactions on Circuits and Systems for Video Technology
Early online date02 Jul 2024
DOIs
Publication statusEarly online date - 02 Jul 2024

Bibliographical note

Publisher Copyright:
IEEE

Keywords

  • Accuracy
  • Brain modeling
  • Data augmentation
  • Data models
  • Image Classification
  • Imbalance Classification
  • Long-tailed problem
  • Predictive models
  • Self-labeling
  • Training
  • Training data

ASJC Scopus subject areas

  • Media Technology
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'LCSL: Long-tailed Classification via Self-labeling'. Together they form a unique fingerprint.

Cite this