Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes

Mari Miyamoto, Daisuke Motooka, Kazuyoshi Goto, Takamasa Imai, Kazutoshi Yoshitake, Naohisa Goto, Tetsuya Iida, Teruo Yasunaga, Toshihiro Horii, Kazuharu Arakawa, Masahiro Kasahara, Shota Nakamura

Research output: Contribution to journalArticle

48 Citations (Scopus)

Abstract

Background: The availability of diverse second- and third-generation sequencing technologies enables the rapid determination of the sequences of bacterial genomes. However, identifying the sequencing technology most suitable for producing a finished genome with multiple chromosomes remains a challenge. We evaluated the abilities of the following three second-generation sequencers: Roche 454 GS Junior (GS Jr), Life Technologies Ion PGM (Ion PGM), and Illumina MiSeq (MiSeq) and a third-generation sequencer, the Pacific Biosciences RS sequencer (PacBio), by sequencing and assembling the genome of Vibrio parahaemolyticus, which consists of a 5-Mb genome comprising two circular chromosomes.Results: We sequenced the genome of V. parahaemolyticus with GS Jr, Ion PGM, MiSeq, and PacBio and performed de novo assembly with several genome assemblers. Although GS Jr generated the longest mean read length of 418 bp among the second-generation sequencers, the maximum contig length of the best assembly from GS Jr was 165 kbp, and the number of contigs was 309. Single runs of Ion PGM and MiSeq produced data of considerably greater sequencing coverage, 279× and 1,927×, respectively. The optimized result for Ion PGM contained 61 contigs assembled from reads of 77× coverage, and the longest contig was 895 kbp in size. Those for MiSeq were 34 contigs, 58× coverage, and 733 kbp, respectively. These results suggest that higher coverage depth is unnecessary for a better assembly result. We observed that multiple rRNA coding regions were fragmented in the assemblies from the second-generation sequencers, whereas PacBio generated two exceptionally long contigs of 3,288,561 and 1,875,537 bps, each of which was from a single chromosome, with 73× coverage and mean read length 3,119 bp, allowing us to determine the absolute positions of all rRNA operons.Conclusions: PacBio outperformed the other sequencers in terms of the length of contigs and reconstructed the greatest portion of the genome, achieving a genome assembly of " finished grade" because of its long reads. It showed the potential to assemble more complex genomes with multiple chromosomes containing more repetitive sequences.

Original languageEnglish
Article number699
JournalBMC Genomics
Volume15
Issue number1
DOIs
Publication statusPublished - Aug 21 2014

Fingerprint

Bacterial Genomes
Chromosomes
Genome
Ions
Vibrio parahaemolyticus
Technology
rRNA Operon
Nucleic Acid Repetitive Sequences
Sequence Analysis

Keywords

  • de novo assembly
  • Illumina MiSeq
  • Ion Torrent PGM
  • Next-generation sequencing
  • PacBio RS system
  • Roche 454 GS Junior

ASJC Scopus subject areas

  • Biotechnology
  • Genetics
  • Medicine(all)

Cite this

Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes. / Miyamoto, Mari; Motooka, Daisuke; Goto, Kazuyoshi; Imai, Takamasa; Yoshitake, Kazutoshi; Goto, Naohisa; Iida, Tetsuya; Yasunaga, Teruo; Horii, Toshihiro; Arakawa, Kazuharu; Kasahara, Masahiro; Nakamura, Shota.

In: BMC Genomics, Vol. 15, No. 1, 699, 21.08.2014.

Research output: Contribution to journalArticle

Miyamoto, M, Motooka, D, Goto, K, Imai, T, Yoshitake, K, Goto, N, Iida, T, Yasunaga, T, Horii, T, Arakawa, K, Kasahara, M & Nakamura, S 2014, 'Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes', BMC Genomics, vol. 15, no. 1, 699. https://doi.org/10.1186/1471-2164-15-699
Miyamoto, Mari ; Motooka, Daisuke ; Goto, Kazuyoshi ; Imai, Takamasa ; Yoshitake, Kazutoshi ; Goto, Naohisa ; Iida, Tetsuya ; Yasunaga, Teruo ; Horii, Toshihiro ; Arakawa, Kazuharu ; Kasahara, Masahiro ; Nakamura, Shota. / Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes. In: BMC Genomics. 2014 ; Vol. 15, No. 1.
@article{d8a0de16d26146c88fad0ac5b621f29b,
title = "Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes",
abstract = "Background: The availability of diverse second- and third-generation sequencing technologies enables the rapid determination of the sequences of bacterial genomes. However, identifying the sequencing technology most suitable for producing a finished genome with multiple chromosomes remains a challenge. We evaluated the abilities of the following three second-generation sequencers: Roche 454 GS Junior (GS Jr), Life Technologies Ion PGM (Ion PGM), and Illumina MiSeq (MiSeq) and a third-generation sequencer, the Pacific Biosciences RS sequencer (PacBio), by sequencing and assembling the genome of Vibrio parahaemolyticus, which consists of a 5-Mb genome comprising two circular chromosomes.Results: We sequenced the genome of V. parahaemolyticus with GS Jr, Ion PGM, MiSeq, and PacBio and performed de novo assembly with several genome assemblers. Although GS Jr generated the longest mean read length of 418 bp among the second-generation sequencers, the maximum contig length of the best assembly from GS Jr was 165 kbp, and the number of contigs was 309. Single runs of Ion PGM and MiSeq produced data of considerably greater sequencing coverage, 279× and 1,927×, respectively. The optimized result for Ion PGM contained 61 contigs assembled from reads of 77× coverage, and the longest contig was 895 kbp in size. Those for MiSeq were 34 contigs, 58× coverage, and 733 kbp, respectively. These results suggest that higher coverage depth is unnecessary for a better assembly result. We observed that multiple rRNA coding regions were fragmented in the assemblies from the second-generation sequencers, whereas PacBio generated two exceptionally long contigs of 3,288,561 and 1,875,537 bps, each of which was from a single chromosome, with 73× coverage and mean read length 3,119 bp, allowing us to determine the absolute positions of all rRNA operons.Conclusions: PacBio outperformed the other sequencers in terms of the length of contigs and reconstructed the greatest portion of the genome, achieving a genome assembly of {"} finished grade{"} because of its long reads. It showed the potential to assemble more complex genomes with multiple chromosomes containing more repetitive sequences.",
keywords = "de novo assembly, Illumina MiSeq, Ion Torrent PGM, Next-generation sequencing, PacBio RS system, Roche 454 GS Junior",
author = "Mari Miyamoto and Daisuke Motooka and Kazuyoshi Goto and Takamasa Imai and Kazutoshi Yoshitake and Naohisa Goto and Tetsuya Iida and Teruo Yasunaga and Toshihiro Horii and Kazuharu Arakawa and Masahiro Kasahara and Shota Nakamura",
year = "2014",
month = "8",
day = "21",
doi = "10.1186/1471-2164-15-699",
language = "English",
volume = "15",
journal = "BMC Genomics",
issn = "1471-2164",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Performance comparison of second- and third-generation sequencers using a bacterial genome with two chromosomes

AU - Miyamoto, Mari

AU - Motooka, Daisuke

AU - Goto, Kazuyoshi

AU - Imai, Takamasa

AU - Yoshitake, Kazutoshi

AU - Goto, Naohisa

AU - Iida, Tetsuya

AU - Yasunaga, Teruo

AU - Horii, Toshihiro

AU - Arakawa, Kazuharu

AU - Kasahara, Masahiro

AU - Nakamura, Shota

PY - 2014/8/21

Y1 - 2014/8/21

N2 - Background: The availability of diverse second- and third-generation sequencing technologies enables the rapid determination of the sequences of bacterial genomes. However, identifying the sequencing technology most suitable for producing a finished genome with multiple chromosomes remains a challenge. We evaluated the abilities of the following three second-generation sequencers: Roche 454 GS Junior (GS Jr), Life Technologies Ion PGM (Ion PGM), and Illumina MiSeq (MiSeq) and a third-generation sequencer, the Pacific Biosciences RS sequencer (PacBio), by sequencing and assembling the genome of Vibrio parahaemolyticus, which consists of a 5-Mb genome comprising two circular chromosomes.Results: We sequenced the genome of V. parahaemolyticus with GS Jr, Ion PGM, MiSeq, and PacBio and performed de novo assembly with several genome assemblers. Although GS Jr generated the longest mean read length of 418 bp among the second-generation sequencers, the maximum contig length of the best assembly from GS Jr was 165 kbp, and the number of contigs was 309. Single runs of Ion PGM and MiSeq produced data of considerably greater sequencing coverage, 279× and 1,927×, respectively. The optimized result for Ion PGM contained 61 contigs assembled from reads of 77× coverage, and the longest contig was 895 kbp in size. Those for MiSeq were 34 contigs, 58× coverage, and 733 kbp, respectively. These results suggest that higher coverage depth is unnecessary for a better assembly result. We observed that multiple rRNA coding regions were fragmented in the assemblies from the second-generation sequencers, whereas PacBio generated two exceptionally long contigs of 3,288,561 and 1,875,537 bps, each of which was from a single chromosome, with 73× coverage and mean read length 3,119 bp, allowing us to determine the absolute positions of all rRNA operons.Conclusions: PacBio outperformed the other sequencers in terms of the length of contigs and reconstructed the greatest portion of the genome, achieving a genome assembly of " finished grade" because of its long reads. It showed the potential to assemble more complex genomes with multiple chromosomes containing more repetitive sequences.

AB - Background: The availability of diverse second- and third-generation sequencing technologies enables the rapid determination of the sequences of bacterial genomes. However, identifying the sequencing technology most suitable for producing a finished genome with multiple chromosomes remains a challenge. We evaluated the abilities of the following three second-generation sequencers: Roche 454 GS Junior (GS Jr), Life Technologies Ion PGM (Ion PGM), and Illumina MiSeq (MiSeq) and a third-generation sequencer, the Pacific Biosciences RS sequencer (PacBio), by sequencing and assembling the genome of Vibrio parahaemolyticus, which consists of a 5-Mb genome comprising two circular chromosomes.Results: We sequenced the genome of V. parahaemolyticus with GS Jr, Ion PGM, MiSeq, and PacBio and performed de novo assembly with several genome assemblers. Although GS Jr generated the longest mean read length of 418 bp among the second-generation sequencers, the maximum contig length of the best assembly from GS Jr was 165 kbp, and the number of contigs was 309. Single runs of Ion PGM and MiSeq produced data of considerably greater sequencing coverage, 279× and 1,927×, respectively. The optimized result for Ion PGM contained 61 contigs assembled from reads of 77× coverage, and the longest contig was 895 kbp in size. Those for MiSeq were 34 contigs, 58× coverage, and 733 kbp, respectively. These results suggest that higher coverage depth is unnecessary for a better assembly result. We observed that multiple rRNA coding regions were fragmented in the assemblies from the second-generation sequencers, whereas PacBio generated two exceptionally long contigs of 3,288,561 and 1,875,537 bps, each of which was from a single chromosome, with 73× coverage and mean read length 3,119 bp, allowing us to determine the absolute positions of all rRNA operons.Conclusions: PacBio outperformed the other sequencers in terms of the length of contigs and reconstructed the greatest portion of the genome, achieving a genome assembly of " finished grade" because of its long reads. It showed the potential to assemble more complex genomes with multiple chromosomes containing more repetitive sequences.

KW - de novo assembly

KW - Illumina MiSeq

KW - Ion Torrent PGM

KW - Next-generation sequencing

KW - PacBio RS system

KW - Roche 454 GS Junior

UR - http://www.scopus.com/inward/record.url?scp=84906823754&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84906823754&partnerID=8YFLogxK

U2 - 10.1186/1471-2164-15-699

DO - 10.1186/1471-2164-15-699

M3 - Article

VL - 15

JO - BMC Genomics

JF - BMC Genomics

SN - 1471-2164

IS - 1

M1 - 699

ER -