Deep learning has become an innovative tool for predicting the properties of a protein. However, obtaining an accurate predictive model using deep learning methods typically requires a large amount of labelled data, which is expensive and time-consuming to accumulate. Even when optimised, these algorithms are often black boxes, which make it challenging to interpret the decision-making processes that lead to the final prediction. Therefore, there is a demand for innovative modelling techniques that overcome these drawbacks within the space of bioinformatic deep learning. To address these issues, we have designed a modelling scheme that utilises techniques from com- puter vision. Specifically, we explore how triplet-networks can form a robust model architecture that is capable of learning and ranking proteins from just a few labelled examples. We evaluate our model on a variety of downstream tasks, including peak absorption wavelength, enantioselectivity, plasma membrane lo- calisation, and thermostability. The embedded representations produced by this method show considerable improvement when compared to previous baseline models. Finally, to emphasise that this is an example of white-box deep learning, we visualised the features produced by the algorithm to gain a better understand- ing as to how the network reaches its prediction for each protein property.
|Title of host publication||IEEE 2020 International Conference on Machine Learning and Applications|
|Publication status||Accepted - 16 Sep 2020|
|Event||IEEE 2020 International Conference on Machine Learning and Applications - |
Duration: 14 Dec 2020 → …
|Conference||IEEE 2020 International Conference on Machine Learning and Applications|
|Abbreviated title||ICMLA 2020|
|Period||14/12/2020 → …|