Abstract
Deep learning has become an innovative tool for predicting the properties of a protein. However, obtaining an accurate predictive model using deep learning methods typically requires a large amount of labelled data, which is expensive and time-consuming to accumulate. Even when optimised, these algorithms are often black boxes, which make it challenging to interpret the decision-making processes that lead to the final prediction. Therefore, there is a demand for innovative modelling techniques that overcome these drawbacks within the space of bioinformatic deep learning. To address these issues, we have designed a modelling scheme that utilises techniques from com- puter vision. Specifically, we explore how triplet-networks can form a robust model architecture that is capable of learning and ranking proteins from just a few labelled examples. We evaluate our model on a variety of downstream tasks, including peak absorption wavelength, enantioselectivity, plasma membrane lo- calisation, and thermostability. The embedded representations produced by this method show considerable improvement when compared to previous baseline models. Finally, to emphasise that this is an example of white-box deep learning, we visualised the features produced by the algorithm to gain a better understand- ing as to how the network reaches its prediction for each protein property.
Original language | English |
---|---|
Title of host publication | IEEE 2020 International Conference on Machine Learning and Applications |
Publisher | IEEE |
Publication status | Accepted - 16 Sep 2020 |
Event | IEEE 2020 International Conference on Machine Learning and Applications - Duration: 14 Dec 2020 → … https://www.icmla-conference.org/icmla20 |
Conference
Conference | IEEE 2020 International Conference on Machine Learning and Applications |
---|---|
Abbreviated title | ICMLA 2020 |
Period | 14/12/2020 → … |
Internet address |