TriFLDB: A database of clustered full-length coding sequences from Triticeae with applications to comparative grass genomics

Keiichi Mochida, Takuhiro Yoshida, Tetsuya Sakurai, Yasunari Ogihara, Kazuo Shinozaki

Research output: Contribution to journalArticle

75 Citations (Scopus)

Abstract

The Triticeae Full-Length CDS Database (TriFLDB) contains available information regarding full-length coding sequences (CDSs) of the Triticeae crops wheat (Triticum aestivum) and barley (Hordeum vulgare) and includes functional annotations and comparative genomics features. TriFLDB provides a search interface using keywords for gene function and related Gene Ontology terms and a similarity search for DNA and deduced translated amino acid sequences to access annotations of Triticeae full-length CDS (TriFLCDS) entries. Annotations consist of similarity search results against several sequence databases and domain structure predictions by InterProScan. The deduced amino acid sequences in TriFLDB are grouped with the proteome datasets for Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and sorghum (Sorghum bicolor) by hierarchical clustering in stepwise thresholds of sequence identity, providing hierarchical clustering results based on full-length protein sequences. The database also provides sequence similarity results based on comparative mapping of TriFLCDSs onto the rice and sorghum genome sequences, which together with current annotations can be used to predict gene structures for TriFLCDS entries. To provide the possible genetic locations of full-length CDSs, TriFLCDS entries are also assigned to the genetically mapped cDNA sequences of barley and diploid wheat, which are currently accommodated in the Triticeae Mapped EST Database. These relational data are searchable from the search interfaces of both databases. The current TriFLDB contains 15,871 full-length CDSs from barley and wheat and includes putative full-length cDNAs for barley and wheat, which are publicly accessible. This informative content provides an informatics gateway for Triticeae genomics and grass comparative genomics. TriFLDB is publicly available at http://TriFLDB.psc.riken.jp.

Original languageEnglish
Pages (from-to)1135-1146
Number of pages12
JournalPlant Physiology
Volume150
Issue number3
DOIs
Publication statusPublished - Jul 2009
Externally publishedYes

Fingerprint

Genomics
Poaceae
Databases
grasses
genomics
Hordeum
Triticum
Sorghum
barley
Arabidopsis
Cluster Analysis
wheat
Amino Acid Sequence
amino acid sequences
Complementary DNA
Sorghum (Poaceae)
Gene Ontology
Informatics
Expressed Sequence Tags
Proteome

ASJC Scopus subject areas

  • Plant Science
  • Genetics
  • Physiology

Cite this

TriFLDB : A database of clustered full-length coding sequences from Triticeae with applications to comparative grass genomics. / Mochida, Keiichi; Yoshida, Takuhiro; Sakurai, Tetsuya; Ogihara, Yasunari; Shinozaki, Kazuo.

In: Plant Physiology, Vol. 150, No. 3, 07.2009, p. 1135-1146.

Research output: Contribution to journalArticle

Mochida, Keiichi ; Yoshida, Takuhiro ; Sakurai, Tetsuya ; Ogihara, Yasunari ; Shinozaki, Kazuo. / TriFLDB : A database of clustered full-length coding sequences from Triticeae with applications to comparative grass genomics. In: Plant Physiology. 2009 ; Vol. 150, No. 3. pp. 1135-1146.
@article{b129eca2f9d64834acbfdfd60f60e95b,
title = "TriFLDB: A database of clustered full-length coding sequences from Triticeae with applications to comparative grass genomics",
abstract = "The Triticeae Full-Length CDS Database (TriFLDB) contains available information regarding full-length coding sequences (CDSs) of the Triticeae crops wheat (Triticum aestivum) and barley (Hordeum vulgare) and includes functional annotations and comparative genomics features. TriFLDB provides a search interface using keywords for gene function and related Gene Ontology terms and a similarity search for DNA and deduced translated amino acid sequences to access annotations of Triticeae full-length CDS (TriFLCDS) entries. Annotations consist of similarity search results against several sequence databases and domain structure predictions by InterProScan. The deduced amino acid sequences in TriFLDB are grouped with the proteome datasets for Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and sorghum (Sorghum bicolor) by hierarchical clustering in stepwise thresholds of sequence identity, providing hierarchical clustering results based on full-length protein sequences. The database also provides sequence similarity results based on comparative mapping of TriFLCDSs onto the rice and sorghum genome sequences, which together with current annotations can be used to predict gene structures for TriFLCDS entries. To provide the possible genetic locations of full-length CDSs, TriFLCDS entries are also assigned to the genetically mapped cDNA sequences of barley and diploid wheat, which are currently accommodated in the Triticeae Mapped EST Database. These relational data are searchable from the search interfaces of both databases. The current TriFLDB contains 15,871 full-length CDSs from barley and wheat and includes putative full-length cDNAs for barley and wheat, which are publicly accessible. This informative content provides an informatics gateway for Triticeae genomics and grass comparative genomics. TriFLDB is publicly available at http://TriFLDB.psc.riken.jp.",
author = "Keiichi Mochida and Takuhiro Yoshida and Tetsuya Sakurai and Yasunari Ogihara and Kazuo Shinozaki",
year = "2009",
month = "7",
doi = "10.1104/pp.109.138214",
language = "English",
volume = "150",
pages = "1135--1146",
journal = "Plant Physiology",
issn = "0032-0889",
publisher = "American Society of Plant Biologists",
number = "3",

}

TY - JOUR

T1 - TriFLDB

T2 - A database of clustered full-length coding sequences from Triticeae with applications to comparative grass genomics

AU - Mochida, Keiichi

AU - Yoshida, Takuhiro

AU - Sakurai, Tetsuya

AU - Ogihara, Yasunari

AU - Shinozaki, Kazuo

PY - 2009/7

Y1 - 2009/7

N2 - The Triticeae Full-Length CDS Database (TriFLDB) contains available information regarding full-length coding sequences (CDSs) of the Triticeae crops wheat (Triticum aestivum) and barley (Hordeum vulgare) and includes functional annotations and comparative genomics features. TriFLDB provides a search interface using keywords for gene function and related Gene Ontology terms and a similarity search for DNA and deduced translated amino acid sequences to access annotations of Triticeae full-length CDS (TriFLCDS) entries. Annotations consist of similarity search results against several sequence databases and domain structure predictions by InterProScan. The deduced amino acid sequences in TriFLDB are grouped with the proteome datasets for Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and sorghum (Sorghum bicolor) by hierarchical clustering in stepwise thresholds of sequence identity, providing hierarchical clustering results based on full-length protein sequences. The database also provides sequence similarity results based on comparative mapping of TriFLCDSs onto the rice and sorghum genome sequences, which together with current annotations can be used to predict gene structures for TriFLCDS entries. To provide the possible genetic locations of full-length CDSs, TriFLCDS entries are also assigned to the genetically mapped cDNA sequences of barley and diploid wheat, which are currently accommodated in the Triticeae Mapped EST Database. These relational data are searchable from the search interfaces of both databases. The current TriFLDB contains 15,871 full-length CDSs from barley and wheat and includes putative full-length cDNAs for barley and wheat, which are publicly accessible. This informative content provides an informatics gateway for Triticeae genomics and grass comparative genomics. TriFLDB is publicly available at http://TriFLDB.psc.riken.jp.

AB - The Triticeae Full-Length CDS Database (TriFLDB) contains available information regarding full-length coding sequences (CDSs) of the Triticeae crops wheat (Triticum aestivum) and barley (Hordeum vulgare) and includes functional annotations and comparative genomics features. TriFLDB provides a search interface using keywords for gene function and related Gene Ontology terms and a similarity search for DNA and deduced translated amino acid sequences to access annotations of Triticeae full-length CDS (TriFLCDS) entries. Annotations consist of similarity search results against several sequence databases and domain structure predictions by InterProScan. The deduced amino acid sequences in TriFLDB are grouped with the proteome datasets for Arabidopsis (Arabidopsis thaliana), rice (Oryza sativa), and sorghum (Sorghum bicolor) by hierarchical clustering in stepwise thresholds of sequence identity, providing hierarchical clustering results based on full-length protein sequences. The database also provides sequence similarity results based on comparative mapping of TriFLCDSs onto the rice and sorghum genome sequences, which together with current annotations can be used to predict gene structures for TriFLCDS entries. To provide the possible genetic locations of full-length CDSs, TriFLCDS entries are also assigned to the genetically mapped cDNA sequences of barley and diploid wheat, which are currently accommodated in the Triticeae Mapped EST Database. These relational data are searchable from the search interfaces of both databases. The current TriFLDB contains 15,871 full-length CDSs from barley and wheat and includes putative full-length cDNAs for barley and wheat, which are publicly accessible. This informative content provides an informatics gateway for Triticeae genomics and grass comparative genomics. TriFLDB is publicly available at http://TriFLDB.psc.riken.jp.

UR - http://www.scopus.com/inward/record.url?scp=67650100882&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=67650100882&partnerID=8YFLogxK

U2 - 10.1104/pp.109.138214

DO - 10.1104/pp.109.138214

M3 - Article

C2 - 19448038

AN - SCOPUS:67650100882

VL - 150

SP - 1135

EP - 1146

JO - Plant Physiology

JF - Plant Physiology

SN - 0032-0889

IS - 3

ER -