Detecting homology of distantly related proteins with consensus sequences

Research output: Contribution to journalArticle

109 Citations (Scopus)

Abstract

A simple protocol is described that is suitable for the detection of distantly related members of a protein family. In this procedure, similarity to a consensus sequence is used to distinguish chance similarity from similarity due to common ancestry. The consensus sequence is constructed from the sequences of established members of a protein family and it incorporates features characteristic of the protein fold of this family: conserved residues, the pattern of variable and conserved segments, preferred location of gaps etc. The database is searched with the consensus sequence, using the unitary matrix or log odds matrix for scoring the alignments, with variable gap penalty. The advantage of the method is that it weights key residues, ignores sequence similarity in variable segments (thus partially eliminating "background noise" coming from chance similarity), distinguishes gaps disrupting conserved segments from those occurring in positions known to be tolerant of gap events. The utility of the method was demonstrated in the case of the protein family homologous with the internal repeats of complement B as well as the internal repeats identified in fibroblast proteoglycan PG40. The consensus sequence method succeeded in finding some new members of these protein families that could not be detected by earlier methods of sequence comparison.

Original languageEnglish
Pages (from-to)567-577
Number of pages11
JournalJournal of Molecular Biology
Volume198
Issue number4
DOIs
Publication statusPublished - Dec 20 1987

Fingerprint

Consensus Sequence
Proteins
Proteoglycans
Noise
Fibroblasts
Databases
Weights and Measures

ASJC Scopus subject areas

  • Virology

Cite this

Detecting homology of distantly related proteins with consensus sequences. / Patthy, L.

In: Journal of Molecular Biology, Vol. 198, No. 4, 20.12.1987, p. 567-577.

Research output: Contribution to journalArticle

@article{4a07a3d08442400cad1ad246523f2695,
title = "Detecting homology of distantly related proteins with consensus sequences",
abstract = "A simple protocol is described that is suitable for the detection of distantly related members of a protein family. In this procedure, similarity to a consensus sequence is used to distinguish chance similarity from similarity due to common ancestry. The consensus sequence is constructed from the sequences of established members of a protein family and it incorporates features characteristic of the protein fold of this family: conserved residues, the pattern of variable and conserved segments, preferred location of gaps etc. The database is searched with the consensus sequence, using the unitary matrix or log odds matrix for scoring the alignments, with variable gap penalty. The advantage of the method is that it weights key residues, ignores sequence similarity in variable segments (thus partially eliminating {"}background noise{"} coming from chance similarity), distinguishes gaps disrupting conserved segments from those occurring in positions known to be tolerant of gap events. The utility of the method was demonstrated in the case of the protein family homologous with the internal repeats of complement B as well as the internal repeats identified in fibroblast proteoglycan PG40. The consensus sequence method succeeded in finding some new members of these protein families that could not be detected by earlier methods of sequence comparison.",
author = "L. Patthy",
year = "1987",
month = "12",
day = "20",
doi = "10.1016/0022-2836(87)90200-2",
language = "English",
volume = "198",
pages = "567--577",
journal = "Journal of Molecular Biology",
issn = "0022-2836",
publisher = "Academic Press Inc.",
number = "4",

}

TY - JOUR

T1 - Detecting homology of distantly related proteins with consensus sequences

AU - Patthy, L.

PY - 1987/12/20

Y1 - 1987/12/20

N2 - A simple protocol is described that is suitable for the detection of distantly related members of a protein family. In this procedure, similarity to a consensus sequence is used to distinguish chance similarity from similarity due to common ancestry. The consensus sequence is constructed from the sequences of established members of a protein family and it incorporates features characteristic of the protein fold of this family: conserved residues, the pattern of variable and conserved segments, preferred location of gaps etc. The database is searched with the consensus sequence, using the unitary matrix or log odds matrix for scoring the alignments, with variable gap penalty. The advantage of the method is that it weights key residues, ignores sequence similarity in variable segments (thus partially eliminating "background noise" coming from chance similarity), distinguishes gaps disrupting conserved segments from those occurring in positions known to be tolerant of gap events. The utility of the method was demonstrated in the case of the protein family homologous with the internal repeats of complement B as well as the internal repeats identified in fibroblast proteoglycan PG40. The consensus sequence method succeeded in finding some new members of these protein families that could not be detected by earlier methods of sequence comparison.

AB - A simple protocol is described that is suitable for the detection of distantly related members of a protein family. In this procedure, similarity to a consensus sequence is used to distinguish chance similarity from similarity due to common ancestry. The consensus sequence is constructed from the sequences of established members of a protein family and it incorporates features characteristic of the protein fold of this family: conserved residues, the pattern of variable and conserved segments, preferred location of gaps etc. The database is searched with the consensus sequence, using the unitary matrix or log odds matrix for scoring the alignments, with variable gap penalty. The advantage of the method is that it weights key residues, ignores sequence similarity in variable segments (thus partially eliminating "background noise" coming from chance similarity), distinguishes gaps disrupting conserved segments from those occurring in positions known to be tolerant of gap events. The utility of the method was demonstrated in the case of the protein family homologous with the internal repeats of complement B as well as the internal repeats identified in fibroblast proteoglycan PG40. The consensus sequence method succeeded in finding some new members of these protein families that could not be detected by earlier methods of sequence comparison.

UR - http://www.scopus.com/inward/record.url?scp=0023576317&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0023576317&partnerID=8YFLogxK

U2 - 10.1016/0022-2836(87)90200-2

DO - 10.1016/0022-2836(87)90200-2

M3 - Article

C2 - 3430622

AN - SCOPUS:0023576317

VL - 198

SP - 567

EP - 577

JO - Journal of Molecular Biology

JF - Journal of Molecular Biology

SN - 0022-2836

IS - 4

ER -