Abstract
This paper addresses the complex problem of learning from unbalanced datasets due to which traditional algorithms may perform poorly. Classification algorithms used for learning tend to favor the larger, less important classes in such problems. In this work, to handle unbalanced data problem, we synthesize data using variational autoencoders (VAE) on raw training samples and then, use various input sources (raw, combination of raw and synthetic) to train different models. We evaluate our method using multiple criteria on SVHN dataset which consists of complex images, and perform a comprehensive comparative analysis of popular CNN architectures when there is balanced and unbalanced data and determine which operates best in class imbalance problem. We found that data synthesis via VAE is reliable and robust, and can help to classify real data with higher accuracy than traditional (unbalanced) data. Our results demonstrate the strength of using VAE to solve the class imbalance problem.
Original language | English |
---|---|
Title of host publication | Analysis of Images, Social Networks and Texts - 8th International Conference, AIST 2019, Revised Selected Papers: Proceedings |
Editors | Wil M.P. van der Aalst, Vladimir Batagelj, Dmitry I. Ignatov, Valentina Kuskova, Sergei O. Kuznetsov, Irina A. Lomazova, Michael Khachay, Andrey Kutuzov, Natalia Loukachevitch, Amedeo Napoli, Panos M. Pardalos, Marcello Pelillo, Andrey V. Savchenko, Elena Tutubalina |
Publisher | Springer |
Pages | 270-281 |
Number of pages | 12 |
ISBN (Print) | 9783030395742 |
DOIs | |
Publication status | Published - 02 Feb 2020 |
Externally published | Yes |
Event | 8th International Conference on Analysis of Images, Social Networks and Texts, AIST 2019 - Kazan, Russian Federation Duration: 17 Jul 2019 → 19 Jul 2019 |
Publication series
Name | Communications in Computer and Information Science |
---|---|
Volume | 1086CCIS |
ISSN (Print) | 1865-0929 |
ISSN (Electronic) | 1865-0937 |
Conference
Conference | 8th International Conference on Analysis of Images, Social Networks and Texts, AIST 2019 |
---|---|
Country/Territory | Russian Federation |
City | Kazan |
Period | 17/07/2019 → 19/07/2019 |
Bibliographical note
Publisher Copyright:© Springer Nature Switzerland AG 2020.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
Keywords
- Convolutional Neural Network (CNN)
- Imbalanced data
- Variational autoencoder (VAE)
ASJC Scopus subject areas
- Computer Science(all)
- Mathematics(all)