Stochastic models of sequence evolution including insertion-deletion events

I. Miklós, Ádám Novák, Rahul Satija, Rune Lyngsø, Jotun Hein

Research output: Contribution to journalArticle

12 Citations (Scopus)

Abstract

Comparison of sequences that have descended from a common ancestor based on an explicit stochastic model of substitutions, insertions and deletions has risen to prominence in the last decade. Making statements about the positions of insertions-deletions (abbr. indels) is central in sequence and genome analysis and is called alignment. This statistical approach is harder conceptually and computationally, than competing approaches based on choosing an alignment according to some optimality criteria. But it has major practical advantages in terms of testing evolutionary hypotheses and parameter estimation. Basic dynamic approaches can allow the analysis of up to 4-5 sequences. MCMC techniques can bring this to about 10-15 sequences. Beyond this, different or heuristic approaches must be used. Besides the computational challenges, increasing realism in the underlying models is presently being addressed. A recent development that has been especially fruitful is combining statistical alignment with the problem of sequence annotation, making statements about the function of each nucleotide/ amino acid. So far gene finding, protein secondary structure prediction and regulatory signal detection has been tackled within this framework. Much progress can be reported, but clearly major challenges remain if this approach is to be central in the analyses of large incoming sequence data sets.

Original languageEnglish
Pages (from-to)453-485
Number of pages33
JournalStatistical Methods in Medical Research
Volume18
Issue number5
DOIs
Publication statusPublished - 2009

Fingerprint

Deletion
Insertion
Sequence Analysis
Stochastic Model
Nucleotides
Genome
Amino Acids
Proteins
Alignment
Structure Prediction
Signal Detection
Optimality Criteria
Protein Structure
Secondary Structure
Markov Chain Monte Carlo
Annotation
Substitution
Parameter Estimation
Heuristics
Datasets

ASJC Scopus subject areas

  • Epidemiology
  • Health Information Management
  • Statistics and Probability

Cite this

Stochastic models of sequence evolution including insertion-deletion events. / Miklós, I.; Novák, Ádám; Satija, Rahul; Lyngsø, Rune; Hein, Jotun.

In: Statistical Methods in Medical Research, Vol. 18, No. 5, 2009, p. 453-485.

Research output: Contribution to journalArticle

Miklós, I. ; Novák, Ádám ; Satija, Rahul ; Lyngsø, Rune ; Hein, Jotun. / Stochastic models of sequence evolution including insertion-deletion events. In: Statistical Methods in Medical Research. 2009 ; Vol. 18, No. 5. pp. 453-485.
@article{6493c272638248749bafa615ab1be7aa,
title = "Stochastic models of sequence evolution including insertion-deletion events",
abstract = "Comparison of sequences that have descended from a common ancestor based on an explicit stochastic model of substitutions, insertions and deletions has risen to prominence in the last decade. Making statements about the positions of insertions-deletions (abbr. indels) is central in sequence and genome analysis and is called alignment. This statistical approach is harder conceptually and computationally, than competing approaches based on choosing an alignment according to some optimality criteria. But it has major practical advantages in terms of testing evolutionary hypotheses and parameter estimation. Basic dynamic approaches can allow the analysis of up to 4-5 sequences. MCMC techniques can bring this to about 10-15 sequences. Beyond this, different or heuristic approaches must be used. Besides the computational challenges, increasing realism in the underlying models is presently being addressed. A recent development that has been especially fruitful is combining statistical alignment with the problem of sequence annotation, making statements about the function of each nucleotide/ amino acid. So far gene finding, protein secondary structure prediction and regulatory signal detection has been tackled within this framework. Much progress can be reported, but clearly major challenges remain if this approach is to be central in the analyses of large incoming sequence data sets.",
author = "I. Mikl{\'o}s and {\'A}d{\'a}m Nov{\'a}k and Rahul Satija and Rune Lyngs{\o} and Jotun Hein",
year = "2009",
doi = "10.1177/0962280208099500",
language = "English",
volume = "18",
pages = "453--485",
journal = "Statistical Methods in Medical Research",
issn = "0962-2802",
publisher = "SAGE Publications Ltd",
number = "5",

}

TY - JOUR

T1 - Stochastic models of sequence evolution including insertion-deletion events

AU - Miklós, I.

AU - Novák, Ádám

AU - Satija, Rahul

AU - Lyngsø, Rune

AU - Hein, Jotun

PY - 2009

Y1 - 2009

N2 - Comparison of sequences that have descended from a common ancestor based on an explicit stochastic model of substitutions, insertions and deletions has risen to prominence in the last decade. Making statements about the positions of insertions-deletions (abbr. indels) is central in sequence and genome analysis and is called alignment. This statistical approach is harder conceptually and computationally, than competing approaches based on choosing an alignment according to some optimality criteria. But it has major practical advantages in terms of testing evolutionary hypotheses and parameter estimation. Basic dynamic approaches can allow the analysis of up to 4-5 sequences. MCMC techniques can bring this to about 10-15 sequences. Beyond this, different or heuristic approaches must be used. Besides the computational challenges, increasing realism in the underlying models is presently being addressed. A recent development that has been especially fruitful is combining statistical alignment with the problem of sequence annotation, making statements about the function of each nucleotide/ amino acid. So far gene finding, protein secondary structure prediction and regulatory signal detection has been tackled within this framework. Much progress can be reported, but clearly major challenges remain if this approach is to be central in the analyses of large incoming sequence data sets.

AB - Comparison of sequences that have descended from a common ancestor based on an explicit stochastic model of substitutions, insertions and deletions has risen to prominence in the last decade. Making statements about the positions of insertions-deletions (abbr. indels) is central in sequence and genome analysis and is called alignment. This statistical approach is harder conceptually and computationally, than competing approaches based on choosing an alignment according to some optimality criteria. But it has major practical advantages in terms of testing evolutionary hypotheses and parameter estimation. Basic dynamic approaches can allow the analysis of up to 4-5 sequences. MCMC techniques can bring this to about 10-15 sequences. Beyond this, different or heuristic approaches must be used. Besides the computational challenges, increasing realism in the underlying models is presently being addressed. A recent development that has been especially fruitful is combining statistical alignment with the problem of sequence annotation, making statements about the function of each nucleotide/ amino acid. So far gene finding, protein secondary structure prediction and regulatory signal detection has been tackled within this framework. Much progress can be reported, but clearly major challenges remain if this approach is to be central in the analyses of large incoming sequence data sets.

UR - http://www.scopus.com/inward/record.url?scp=70349738209&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70349738209&partnerID=8YFLogxK

U2 - 10.1177/0962280208099500

DO - 10.1177/0962280208099500

M3 - Article

VL - 18

SP - 453

EP - 485

JO - Statistical Methods in Medical Research

JF - Statistical Methods in Medical Research

SN - 0962-2802

IS - 5

ER -