Feature Extraction Based on Stacked Auto Encoder for Protein Secondary Structure Prediction

WCSE 2017
ISBN: 978-981-11-3671-9 DOI: 10.18178/wcse.2017.06.233

Yehong Che, Jinyong Cheng, Yihui Liu

Abstract— In this paper, a novel sequence feature extraction method based on the deep learning network is proposed for protein secondary structure prediction. This deep learning architecture, mainly composed of two layers stacked auto encoder and a fully connected softmax classifier. Position-specific scoring matrix (PSSM) profiles are used as raw data for feature extraction. The stacked auto encoder structure could learn the second order feature parameters by the importance on massive PSSM profiles of polypeptide unaware of secondary structure, which does improve the performance of the encoder in general. Compared to the representation of original PSSM profiles, the extracted feature not only reflects the evolutionary information, but also the sequence interaction of residues. Finally, the extracted features are fed into a fully connected softmax layer as a classifier for the secondary structure prediction. The experimental results indicate that this method can achieve an overall accuracy (Q3) above 78% on 25PDB. This is comparable with that of the art-of-the-state PSSM+SVM methods, at the same time, in relatively short prediction period.

Index Terms— Sparse auto-encoder, Stacked auto encoder, Protein secondary structure prediction, Deep learning neural network.

Yehong Che, Jinyong Cheng, Yihui Liu
1Institute of Intelligent Information Processing, 2School of Printing & Packaging, Qilu University of Technology, CHINA

ISBN: 978-981-11-3671-9 DOI: 10.18178/wcse.2017.06.17Xsrc="http://www.wcse.org/uploadfile/2019/0823/20190823055609629.png" style="width: 120px; height: 68px;" />[Download]

Cite: Yehong Che, Jinyong Cheng, Yihui Liu, "Feature Extraction Based on Stacked Auto Encoder for Protein Secondary Structure Prediction," Proceedings of 2017 the 7th International Workshop on Computer Science and Engineering, pp. 1345-1352, Beijing, 25-27 June, 2017.

PREVIOUS PAPER
A Conceptual Framework for Transparent Public Decision-Making through E-Government Development in Saudi Arabia

NEXT PAPER
A Prediction and Correction Model for Protein Secondary Structure Prediction