Prediction of carbohydrate-binding proteins from sequences using support vector machines

Kentaro Shimizu, Seizi Someya, Masanori Kakuta, Mizuki Morita, Kazuya Sumikoshi, Wei Cao, Zhenyi Ge, Osamu Hirose, Shugo Nakamura, Tohru Terada

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

Carbohydrate-binding proteins are proteins that can interact with sugar chains but do not modify them. They are involved in many physiological functions, and we have developed a method for predicting them from their amino acid sequences. Our method is based on support vector machines (SVMs). We first clarified the definition of carbohydrate-binding proteins and then constructed positive and negative datasets with which the SVMs were trained. By applying the leave-one-out test to these datasets, our method delivered 0.92 of the area under the receiver operating characteristic (ROC) curve. We also examined two amino acid grouping methods that enable effective learning of sequence patterns and evaluated the performance of these methods. When we applied our method in combination with the homology-based prediction method to the annotated human genome database, H-invDB, we found that the true positive rate of prediction was improved.

Original languageEnglish
Article number289301
JournalAdvances in Bioinformatics
Volume2010
DOIs
Publication statusPublished - 2010
Externally publishedYes

Fingerprint

Carbohydrates
Support vector machines
Amino acids
Amino Acids
Sugars
Genes
Proteins
Human Genome
Carrier Proteins
saccharide-binding proteins
Support Vector Machine
ROC Curve
Amino Acid Sequence
Learning
Databases

ASJC Scopus subject areas

  • Computer Science Applications
  • Biochemistry, Genetics and Molecular Biology (miscellaneous)
  • Biomedical Engineering

Cite this

Prediction of carbohydrate-binding proteins from sequences using support vector machines. / Shimizu, Kentaro; Someya, Seizi; Kakuta, Masanori; Morita, Mizuki; Sumikoshi, Kazuya; Cao, Wei; Ge, Zhenyi; Hirose, Osamu; Nakamura, Shugo; Terada, Tohru.

In: Advances in Bioinformatics, Vol. 2010, 289301, 2010.

Research output: Contribution to journalArticle

Shimizu, K, Someya, S, Kakuta, M, Morita, M, Sumikoshi, K, Cao, W, Ge, Z, Hirose, O, Nakamura, S & Terada, T 2010, 'Prediction of carbohydrate-binding proteins from sequences using support vector machines', Advances in Bioinformatics, vol. 2010, 289301. https://doi.org/10.1155/2010/289301
Shimizu, Kentaro ; Someya, Seizi ; Kakuta, Masanori ; Morita, Mizuki ; Sumikoshi, Kazuya ; Cao, Wei ; Ge, Zhenyi ; Hirose, Osamu ; Nakamura, Shugo ; Terada, Tohru. / Prediction of carbohydrate-binding proteins from sequences using support vector machines. In: Advances in Bioinformatics. 2010 ; Vol. 2010.
@article{904db63f2e49443cb1f35ad22325ab7f,
title = "Prediction of carbohydrate-binding proteins from sequences using support vector machines",
abstract = "Carbohydrate-binding proteins are proteins that can interact with sugar chains but do not modify them. They are involved in many physiological functions, and we have developed a method for predicting them from their amino acid sequences. Our method is based on support vector machines (SVMs). We first clarified the definition of carbohydrate-binding proteins and then constructed positive and negative datasets with which the SVMs were trained. By applying the leave-one-out test to these datasets, our method delivered 0.92 of the area under the receiver operating characteristic (ROC) curve. We also examined two amino acid grouping methods that enable effective learning of sequence patterns and evaluated the performance of these methods. When we applied our method in combination with the homology-based prediction method to the annotated human genome database, H-invDB, we found that the true positive rate of prediction was improved.",
author = "Kentaro Shimizu and Seizi Someya and Masanori Kakuta and Mizuki Morita and Kazuya Sumikoshi and Wei Cao and Zhenyi Ge and Osamu Hirose and Shugo Nakamura and Tohru Terada",
year = "2010",
doi = "10.1155/2010/289301",
language = "English",
volume = "2010",
journal = "Advances in Bioinformatics",
issn = "1687-8027",
publisher = "Hindawi Publishing Corporation",

}

TY - JOUR

T1 - Prediction of carbohydrate-binding proteins from sequences using support vector machines

AU - Shimizu, Kentaro

AU - Someya, Seizi

AU - Kakuta, Masanori

AU - Morita, Mizuki

AU - Sumikoshi, Kazuya

AU - Cao, Wei

AU - Ge, Zhenyi

AU - Hirose, Osamu

AU - Nakamura, Shugo

AU - Terada, Tohru

PY - 2010

Y1 - 2010

N2 - Carbohydrate-binding proteins are proteins that can interact with sugar chains but do not modify them. They are involved in many physiological functions, and we have developed a method for predicting them from their amino acid sequences. Our method is based on support vector machines (SVMs). We first clarified the definition of carbohydrate-binding proteins and then constructed positive and negative datasets with which the SVMs were trained. By applying the leave-one-out test to these datasets, our method delivered 0.92 of the area under the receiver operating characteristic (ROC) curve. We also examined two amino acid grouping methods that enable effective learning of sequence patterns and evaluated the performance of these methods. When we applied our method in combination with the homology-based prediction method to the annotated human genome database, H-invDB, we found that the true positive rate of prediction was improved.

AB - Carbohydrate-binding proteins are proteins that can interact with sugar chains but do not modify them. They are involved in many physiological functions, and we have developed a method for predicting them from their amino acid sequences. Our method is based on support vector machines (SVMs). We first clarified the definition of carbohydrate-binding proteins and then constructed positive and negative datasets with which the SVMs were trained. By applying the leave-one-out test to these datasets, our method delivered 0.92 of the area under the receiver operating characteristic (ROC) curve. We also examined two amino acid grouping methods that enable effective learning of sequence patterns and evaluated the performance of these methods. When we applied our method in combination with the homology-based prediction method to the annotated human genome database, H-invDB, we found that the true positive rate of prediction was improved.

UR - http://www.scopus.com/inward/record.url?scp=78349235041&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78349235041&partnerID=8YFLogxK

U2 - 10.1155/2010/289301

DO - 10.1155/2010/289301

M3 - Article

C2 - 20936154

AN - SCOPUS:78349235041

VL - 2010

JO - Advances in Bioinformatics

JF - Advances in Bioinformatics

SN - 1687-8027

M1 - 289301

ER -