论文库首页  论文库
 
论文编号:
论文题目: Predicting Citrullination Sites in Protein Sequences Using mRMR Method and Random Forest Algorithm
英文论文题目: Predicting Citrullination Sites in Protein Sequences Using mRMR Method and Random Forest Algorithm
第一作者: Zhang, Q; Sun, XJ; Feng, KY; Wang, SP; Zhang, YH; Wan, SB; Lu, L; Cai, YD
英文第一作者: Zhang, Q; Sun, XJ; Feng, KY; Wang, SP; Zhang, YH; Wan, SB; Lu, L; Cai, YD
联系作者: Cai, YD (reprint author), Shanghai Univ, Sch Life Sci, Shanghai 200444, Peoples R China.
英文联系作者: Cai, YD (reprint author), Shanghai Univ, Sch Life Sci, Shanghai 200444, Peoples R China.
外单位作者单位:
英文外单位作者单位:
发表年度: 2017
卷: 20
期: 2
页码: 164-173
摘要: Background: As one of essential post-translational modifications (PTMs), the citrullination or deimination on an arginine residue would change the molecular weight and electrostatic charge of its side-chain. And it has been found that the citrullination in protein sequences was catalyzed by a type of Ca2+-dependent enzyme family called peptidylarginine deiminase (PAD), which include five isotypes: PAD1, 2, 3, 4/5, and 6. Citrullinated proteins participate in many biological processes, e.g. the citrullination of myelin basic protein (MBP) assists the early development of central nervous system. However, abnormal modifications on citrullinated proteins would also lead to some severe human diseases including multiple sclerosis and rheumatoid arthritis. Objective: Therefore, it is necessary and important to identify the citrullination sites in protein sequences. The information about the location of citrulliantion sites in protein sequences will be useful to investigate the molecular functions and disease mechanisms related to citrullinated proteins. Materials and Methods: In this study, we investigated the peptide segments that contain the citrullination sites in the centers, which were encoded into numeric digits from four aspects. Thus, we yielded a training set with 116 positive samples and 232 negative samples. Then, a reliable feature selection technique, called maximum-relevance-minimum-redundancy (mRMR), was applied to analyze these features, and four algorithms, including random forest (RF), Dagging, nearest neighbor algorithm (NNA), and support vector machine (SVM), together with the incremental feature selection (IFS) method were adopted to extract important features. Results: Finally an optimal classifier derived from RF algorithm was constructed to predict citrullination sites. 44 most prominent features were comprehensively analyzed and their biological characteristics in citrullination catalysis were also revealed. Conclusion: We believed that the biological features obtained in this pioneering work would provide some useful insights into the formation and function of citrullination and the optimal classifier could be a useful tool to identify citrullination sites in protein sequences.
英文摘要: Background: As one of essential post-translational modifications (PTMs), the citrullination or deimination on an arginine residue would change the molecular weight and electrostatic charge of its side-chain. And it has been found that the citrullination in protein sequences was catalyzed by a type of Ca2+-dependent enzyme family called peptidylarginine deiminase (PAD), which include five isotypes: PAD1, 2, 3, 4/5, and 6. Citrullinated proteins participate in many biological processes, e.g. the citrullination of myelin basic protein (MBP) assists the early development of central nervous system. However, abnormal modifications on citrullinated proteins would also lead to some severe human diseases including multiple sclerosis and rheumatoid arthritis. Objective: Therefore, it is necessary and important to identify the citrullination sites in protein sequences. The information about the location of citrulliantion sites in protein sequences will be useful to investigate the molecular functions and disease mechanisms related to citrullinated proteins. Materials and Methods: In this study, we investigated the peptide segments that contain the citrullination sites in the centers, which were encoded into numeric digits from four aspects. Thus, we yielded a training set with 116 positive samples and 232 negative samples. Then, a reliable feature selection technique, called maximum-relevance-minimum-redundancy (mRMR), was applied to analyze these features, and four algorithms, including random forest (RF), Dagging, nearest neighbor algorithm (NNA), and support vector machine (SVM), together with the incremental feature selection (IFS) method were adopted to extract important features. Results: Finally an optimal classifier derived from RF algorithm was constructed to predict citrullination sites. 44 most prominent features were comprehensively analyzed and their biological characteristics in citrullination catalysis were also revealed. Conclusion: We believed that the biological features obtained in this pioneering work would provide some useful insights into the formation and function of citrullination and the optimal classifier could be a useful tool to identify citrullination sites in protein sequences.
刊物名称: COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING
英文刊物名称: COMBINATORIAL CHEMISTRY & HIGH THROUGHPUT SCREENING
论文全文:
英文论文全文:
全文链接:
其它备注:
英文其它备注:
学科: Biochemical Research Methods; Chemistry, Applied; Pharmacology & Pharmacy
英文学科: Biochemical Research Methods; Chemistry, Applied; Pharmacology & Pharmacy
影响因子: 0.952
第一作者所在部门:
英文第一作者所在部门:
论文出处:
英文论文出处:
论文类别: Article
英文论文类别: Article
参与作者:
英文参与作者:
 
2014 中国科学院上海生命科学研究院 版权所有