Abstract
Method: We present PhosphoGAN, a new semi-supervised generative adversarial approach to training a deep neural network. PhosphoGAN will produce a classifier that is capable of taking raw peptide sequence data as input and use both a convolutional neural network (CNN) and a two-dimensional bidirectional long short-term memory attention mechanism to classify the data. We show that given the lack of training data one could adopt a semi-supervised training scheme using a generative adversarial neural network (GAN). Training the classifier in this adversarial approach leads to a far more accurate classifier, as it now has a new source of training data.
Results: The phosphorylation data used in the experiment is for Homo sapiens and was gathered from UniProt/Swiss-Prot. It consisted of the phosphorylation sites on serines (S), threonine (T) and tyrosine (Y), which provided a source for the positive data for the experiment, while the negative data was taking the same amino acid excluding annotated phosphorylation sites from the proteins. To evaluate the performance of both models, a five cross-fold validation was used, and the area under the receiver characteristic curve, average precision and F1 scores were then calculated for each fold. Whereby PhosphoGAN outperformed MusiteDeep in both general phosphorylation site prediction and kinase-specific prediction.
Conclusion: By applying a new semi-supervised training approach along with a new model architecture for the classifier, we obtain results that outperform the current state of the art MusiteDeep model. These results demonstrate how deep learning can be applied with a significant effect on a problem where the training data is insufficient and unbalanced.
Results: The phosphorylation data used in the experiment is for Homo sapiens and was gathered from UniProt/Swiss-Prot. It consisted of the phosphorylation sites on serines (S), threonine (T) and tyrosine (Y), which provided a source for the positive data for the experiment, while the negative data was taking the same amino acid excluding annotated phosphorylation sites from the proteins. To evaluate the performance of both models, a five cross-fold validation was used, and the area under the receiver characteristic curve, average precision and F1 scores were then calculated for each fold. Whereby PhosphoGAN outperformed MusiteDeep in both general phosphorylation site prediction and kinase-specific prediction.
Conclusion: By applying a new semi-supervised training approach along with a new model architecture for the classifier, we obtain results that outperform the current state of the art MusiteDeep model. These results demonstrate how deep learning can be applied with a significant effect on a problem where the training data is insufficient and unbalanced.
Original language | English |
---|---|
Publication status | Published - 15 Oct 2018 |
Event | 30th Anniversary AACR Special Conference Convergence: Artificial Intelligence, Big Data, and Prediction in Cancer - Newport, United States Duration: 14 Oct 2018 → … https://www.aacr.org/Meetings/Pages/MeetingDetail.aspx?EventItemID=149&DetailItemID=847#.W2A1_GLTXDu |
Conference
Conference | 30th Anniversary AACR Special Conference Convergence: Artificial Intelligence, Big Data, and Prediction in Cancer |
---|---|
Country/Territory | United States |
City | Newport |
Period | 14/10/2018 → … |
Internet address |
Fingerprint
Dive into the research topics of 'PhosphoGAN: Enhancing the prediction process of general and kinase-specific phosphorylation sites'. Together they form a unique fingerprint.Student theses
-
Deep learning of proteomics data
Lennox, M. (Author), Robertson, N. (Supervisor) & Devereux, B. (Supervisor), Dec 2021Student thesis: Doctoral Thesis › Doctor of Philosophy
File