Abstract
In gradient-based optimisation, the derivative of the loss of an example can be interpreted as the example’s effect on the update of a model. Consequently, a derivative magnitude function can be considered to provide a weighting scheme from the viewpoint of example weighting. Therefore, example weighting is universal in deep learning. Partially arising from the recent work on the risky memorisation behaviours of deep neural networks (Arpit et al., 2017; Zhang et al., 2017b), example weighting becomes an active research filed (Chang et al., 2017; Toneva et al., 2019). Example weighting has ‘hard’ and ‘soft’ versions: (1) ‘hard’ weighting is well-known as sample selection or mining, i.e., binary weighting; (2) ‘soft’ weighting means example differentiation using a continuous importance score.In this thesis, we study how to learn more robust and discriminative representations using deep supervised learning. Technically, we propose example weighting for better optimisation and regularisation. Example weighting techniques differentiate and weight training data points according to a criteria, which varies in different scenarios. Example weighting improves the generalisation performance a lot, which is proved across multiple network architectures and learning tasks. We focus on two learning tasks in this thesis: learning to rank, and learning to classify. In both tasks, we reveal the importance of example weighting, by which a deep model focuses on more informative patterns, and pays less attention to non-informative (easy) and noisy (usually extremely hard) ones during the learning process. Therefore, example weighting is an important tool for guiding deep models to treat training samples differentially and learn meaningful patterns robustly and effectively.
Furthermore, our study on example weighting helps us understand better about the training data and a model’s learning process. When a training dataset is clean, naively assigning higher weights to harder examples works well. However, when the dataset contains both meaningful and wrong information, a model learns meaningful patterns before fitting random errors. The challenge becomes how to differentiate trusted and error patterns as training progresses, and avoid fitting the error transformation. We demonstrate that example weighting is an effective approach for addressing this challenge. Additionally, we empirically justify the effectiveness of our proposed example weighting methods in other adverse cases: (1) in-distribution anomalies, e.g., label noise; (2) out-of-distribution anomalies, e.g., input with no object of interest; (3) sample imbalance.
Date of Award | Dec 2020 |
---|---|
Original language | English |
Awarding Institution |
|
Sponsors | Anyvision (NI) Ltd |
Supervisor | Neil Robertson (Supervisor) & Yang Hua (Supervisor) |
Keywords
- Deep metric learning
- robust deep learning
- semisupervised learning
- missing labels
- noisy labels
- regularisation
- overfitting
- sample imbalance
- example weighting
- discriminative representation learning