How reliably can we predict the reliability of protein structure predictions?

I. Miklós, Ádám Novák, Balázs Dombai, Jotun Hein

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Background; Comparative methods have been the standard techniques for in silico protein structure prediction. The prediction is based on a multiple alignment that contains both reference sequences with known structures and the sequence whose unknown structure is predicted. Intensive research has been made to improve the quality of multiple alignments, since misaligned parts of the multiple alignment yield misleading predictions. However, sometimes all methods fail to predict the correct alignment, because the evolutionary signal is too weak to find the homologous parts due to the large number of mutations that separate the sequences. Results: Stochastic sequence alignment methods define a posterior distribution of possible multiple alignments. They can highlight the most likely alignment, and above that, they can give posterior probabilities for each alignment column. We made a comprehensive study on the HOMSTRAD database of structural alignments, predicting secondary structures in four different ways. We showed that alignment posterior probabilities correlate with the reliability of secondary structure predictions, though the strength of the correlation is different for different protocols. The correspondence between the reliability of secondary structure predictions and alignment posterior probabilities is the closest to the identity function when the secondary structure posterior probabilities are calculated from the posterior distribution of multiple alignments. The largest deviation from the identity function has been obtained in the case of predicting secondary structures from a single optimal pairwise alignment. We also showed that alignment posterior probabilities correlate with the 3D distances between Cα amino acids in superimposed tertiary structures. Conclusion: Alignment posterior probabilities can be used to a priori detect errors in comparative models on the sequence alignment level.

Original languageEnglish
Article number137
JournalBMC Bioinformatics
Volume9
DOIs
Publication statusPublished - Mar 3 2008

Fingerprint

Protein Structure Prediction
Alignment
Proteins
Predict
Posterior Probability
Sequence Alignment
Secondary Structure
Identity function
Structure Prediction
Computer Simulation
Posterior distribution
Correlate
Databases
Amino Acids
Mutation
Research
Prediction
Large Deviations

ASJC Scopus subject areas

  • Medicine(all)
  • Structural Biology
  • Applied Mathematics

Cite this

How reliably can we predict the reliability of protein structure predictions? / Miklós, I.; Novák, Ádám; Dombai, Balázs; Hein, Jotun.

In: BMC Bioinformatics, Vol. 9, 137, 03.03.2008.

Research output: Contribution to journalArticle

Miklós, I. ; Novák, Ádám ; Dombai, Balázs ; Hein, Jotun. / How reliably can we predict the reliability of protein structure predictions?. In: BMC Bioinformatics. 2008 ; Vol. 9.
@article{aade409dea2c45c291a92e65ab6dece5,
title = "How reliably can we predict the reliability of protein structure predictions?",
abstract = "Background; Comparative methods have been the standard techniques for in silico protein structure prediction. The prediction is based on a multiple alignment that contains both reference sequences with known structures and the sequence whose unknown structure is predicted. Intensive research has been made to improve the quality of multiple alignments, since misaligned parts of the multiple alignment yield misleading predictions. However, sometimes all methods fail to predict the correct alignment, because the evolutionary signal is too weak to find the homologous parts due to the large number of mutations that separate the sequences. Results: Stochastic sequence alignment methods define a posterior distribution of possible multiple alignments. They can highlight the most likely alignment, and above that, they can give posterior probabilities for each alignment column. We made a comprehensive study on the HOMSTRAD database of structural alignments, predicting secondary structures in four different ways. We showed that alignment posterior probabilities correlate with the reliability of secondary structure predictions, though the strength of the correlation is different for different protocols. The correspondence between the reliability of secondary structure predictions and alignment posterior probabilities is the closest to the identity function when the secondary structure posterior probabilities are calculated from the posterior distribution of multiple alignments. The largest deviation from the identity function has been obtained in the case of predicting secondary structures from a single optimal pairwise alignment. We also showed that alignment posterior probabilities correlate with the 3D distances between Cα amino acids in superimposed tertiary structures. Conclusion: Alignment posterior probabilities can be used to a priori detect errors in comparative models on the sequence alignment level.",
author = "I. Mikl{\'o}s and {\'A}d{\'a}m Nov{\'a}k and Bal{\'a}zs Dombai and Jotun Hein",
year = "2008",
month = "3",
day = "3",
doi = "10.1186/1471-2105-9-137",
language = "English",
volume = "9",
journal = "BMC Bioinformatics",
issn = "1471-2105",
publisher = "BioMed Central",

}

TY - JOUR

T1 - How reliably can we predict the reliability of protein structure predictions?

AU - Miklós, I.

AU - Novák, Ádám

AU - Dombai, Balázs

AU - Hein, Jotun

PY - 2008/3/3

Y1 - 2008/3/3

N2 - Background; Comparative methods have been the standard techniques for in silico protein structure prediction. The prediction is based on a multiple alignment that contains both reference sequences with known structures and the sequence whose unknown structure is predicted. Intensive research has been made to improve the quality of multiple alignments, since misaligned parts of the multiple alignment yield misleading predictions. However, sometimes all methods fail to predict the correct alignment, because the evolutionary signal is too weak to find the homologous parts due to the large number of mutations that separate the sequences. Results: Stochastic sequence alignment methods define a posterior distribution of possible multiple alignments. They can highlight the most likely alignment, and above that, they can give posterior probabilities for each alignment column. We made a comprehensive study on the HOMSTRAD database of structural alignments, predicting secondary structures in four different ways. We showed that alignment posterior probabilities correlate with the reliability of secondary structure predictions, though the strength of the correlation is different for different protocols. The correspondence between the reliability of secondary structure predictions and alignment posterior probabilities is the closest to the identity function when the secondary structure posterior probabilities are calculated from the posterior distribution of multiple alignments. The largest deviation from the identity function has been obtained in the case of predicting secondary structures from a single optimal pairwise alignment. We also showed that alignment posterior probabilities correlate with the 3D distances between Cα amino acids in superimposed tertiary structures. Conclusion: Alignment posterior probabilities can be used to a priori detect errors in comparative models on the sequence alignment level.

AB - Background; Comparative methods have been the standard techniques for in silico protein structure prediction. The prediction is based on a multiple alignment that contains both reference sequences with known structures and the sequence whose unknown structure is predicted. Intensive research has been made to improve the quality of multiple alignments, since misaligned parts of the multiple alignment yield misleading predictions. However, sometimes all methods fail to predict the correct alignment, because the evolutionary signal is too weak to find the homologous parts due to the large number of mutations that separate the sequences. Results: Stochastic sequence alignment methods define a posterior distribution of possible multiple alignments. They can highlight the most likely alignment, and above that, they can give posterior probabilities for each alignment column. We made a comprehensive study on the HOMSTRAD database of structural alignments, predicting secondary structures in four different ways. We showed that alignment posterior probabilities correlate with the reliability of secondary structure predictions, though the strength of the correlation is different for different protocols. The correspondence between the reliability of secondary structure predictions and alignment posterior probabilities is the closest to the identity function when the secondary structure posterior probabilities are calculated from the posterior distribution of multiple alignments. The largest deviation from the identity function has been obtained in the case of predicting secondary structures from a single optimal pairwise alignment. We also showed that alignment posterior probabilities correlate with the 3D distances between Cα amino acids in superimposed tertiary structures. Conclusion: Alignment posterior probabilities can be used to a priori detect errors in comparative models on the sequence alignment level.

UR - http://www.scopus.com/inward/record.url?scp=42449102610&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=42449102610&partnerID=8YFLogxK

U2 - 10.1186/1471-2105-9-137

DO - 10.1186/1471-2105-9-137

M3 - Article

C2 - 18315874

AN - SCOPUS:42449102610

VL - 9

JO - BMC Bioinformatics

JF - BMC Bioinformatics

SN - 1471-2105

M1 - 137

ER -