Complete de Novo Assembly of Monoclonal Antibody Sequences

Ngoc Hieu Tran, M. Ziaur Rahman, Lin He, Lei Xin, Baozhen Shan, Ming Li

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

De novo protein sequencing is one of the key problems in mass spectrometry-based proteomics, especially for novel proteins such as monoclonal antibodies for which genome information is often limited or not available. However, due to limitations in peptides fragmentation and coverage, as well as ambiguities in spectra interpretation, complete de novo assembly of unknown protein sequences still remains challenging. To address this problem, we propose an integrated system, ALPS, which for the first time can automatically assemble full-length monoclonal antibody sequences. Our system integrates de novo sequencing peptides, their quality scores and error-correction information from databases into a weighted de Bruijn graph to assemble protein sequences. We evaluated ALPS performance on two antibody data sets, each including a heavy chain and a light chain. The results show that ALPS was able to assemble three complete monoclonal antibody sequences of length 216-441 AA, at 100% coverage, and 96.64-100% accuracy.

Original languageEnglish
Article number31730
JournalScientific Reports
Volume6
DOIs
Publication statusPublished - Aug 26 2016
Externally publishedYes

Fingerprint

Monoclonal Antibodies
Peptides
Proteins
Protein Sequence Analysis
Proteomics
Mass Spectrometry
Genome
Databases
Light
Antibodies
Datasets

ASJC Scopus subject areas

  • General

Cite this

Tran, N. H., Rahman, M. Z., He, L., Xin, L., Shan, B., & Li, M. (2016). Complete de Novo Assembly of Monoclonal Antibody Sequences. Scientific Reports, 6, [31730]. https://doi.org/10.1038/srep31730

Complete de Novo Assembly of Monoclonal Antibody Sequences. / Tran, Ngoc Hieu; Rahman, M. Ziaur; He, Lin; Xin, Lei; Shan, Baozhen; Li, Ming.

In: Scientific Reports, Vol. 6, 31730, 26.08.2016.

Research output: Contribution to journalArticle

Tran, NH, Rahman, MZ, He, L, Xin, L, Shan, B & Li, M 2016, 'Complete de Novo Assembly of Monoclonal Antibody Sequences', Scientific Reports, vol. 6, 31730. https://doi.org/10.1038/srep31730
Tran NH, Rahman MZ, He L, Xin L, Shan B, Li M. Complete de Novo Assembly of Monoclonal Antibody Sequences. Scientific Reports. 2016 Aug 26;6. 31730. https://doi.org/10.1038/srep31730
Tran, Ngoc Hieu ; Rahman, M. Ziaur ; He, Lin ; Xin, Lei ; Shan, Baozhen ; Li, Ming. / Complete de Novo Assembly of Monoclonal Antibody Sequences. In: Scientific Reports. 2016 ; Vol. 6.
@article{2e3fd893595a476c88be33740b38eaf2,
title = "Complete de Novo Assembly of Monoclonal Antibody Sequences",
abstract = "De novo protein sequencing is one of the key problems in mass spectrometry-based proteomics, especially for novel proteins such as monoclonal antibodies for which genome information is often limited or not available. However, due to limitations in peptides fragmentation and coverage, as well as ambiguities in spectra interpretation, complete de novo assembly of unknown protein sequences still remains challenging. To address this problem, we propose an integrated system, ALPS, which for the first time can automatically assemble full-length monoclonal antibody sequences. Our system integrates de novo sequencing peptides, their quality scores and error-correction information from databases into a weighted de Bruijn graph to assemble protein sequences. We evaluated ALPS performance on two antibody data sets, each including a heavy chain and a light chain. The results show that ALPS was able to assemble three complete monoclonal antibody sequences of length 216-441 AA, at 100{\%} coverage, and 96.64-100{\%} accuracy.",
author = "Tran, {Ngoc Hieu} and Rahman, {M. Ziaur} and Lin He and Lei Xin and Baozhen Shan and Ming Li",
year = "2016",
month = "8",
day = "26",
doi = "10.1038/srep31730",
language = "English",
volume = "6",
journal = "Scientific Reports",
issn = "2045-2322",
publisher = "Nature Publishing Group",

}

TY - JOUR

T1 - Complete de Novo Assembly of Monoclonal Antibody Sequences

AU - Tran, Ngoc Hieu

AU - Rahman, M. Ziaur

AU - He, Lin

AU - Xin, Lei

AU - Shan, Baozhen

AU - Li, Ming

PY - 2016/8/26

Y1 - 2016/8/26

N2 - De novo protein sequencing is one of the key problems in mass spectrometry-based proteomics, especially for novel proteins such as monoclonal antibodies for which genome information is often limited or not available. However, due to limitations in peptides fragmentation and coverage, as well as ambiguities in spectra interpretation, complete de novo assembly of unknown protein sequences still remains challenging. To address this problem, we propose an integrated system, ALPS, which for the first time can automatically assemble full-length monoclonal antibody sequences. Our system integrates de novo sequencing peptides, their quality scores and error-correction information from databases into a weighted de Bruijn graph to assemble protein sequences. We evaluated ALPS performance on two antibody data sets, each including a heavy chain and a light chain. The results show that ALPS was able to assemble three complete monoclonal antibody sequences of length 216-441 AA, at 100% coverage, and 96.64-100% accuracy.

AB - De novo protein sequencing is one of the key problems in mass spectrometry-based proteomics, especially for novel proteins such as monoclonal antibodies for which genome information is often limited or not available. However, due to limitations in peptides fragmentation and coverage, as well as ambiguities in spectra interpretation, complete de novo assembly of unknown protein sequences still remains challenging. To address this problem, we propose an integrated system, ALPS, which for the first time can automatically assemble full-length monoclonal antibody sequences. Our system integrates de novo sequencing peptides, their quality scores and error-correction information from databases into a weighted de Bruijn graph to assemble protein sequences. We evaluated ALPS performance on two antibody data sets, each including a heavy chain and a light chain. The results show that ALPS was able to assemble three complete monoclonal antibody sequences of length 216-441 AA, at 100% coverage, and 96.64-100% accuracy.

UR - http://www.scopus.com/inward/record.url?scp=84984649632&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84984649632&partnerID=8YFLogxK

U2 - 10.1038/srep31730

DO - 10.1038/srep31730

M3 - Article

AN - SCOPUS:84984649632

VL - 6

JO - Scientific Reports

JF - Scientific Reports

SN - 2045-2322

M1 - 31730

ER -