Bayesian sampling of genomic rearrangement scenarios via double cut and join

I. Miklós, Eric Tannier

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

Motivation: When comparing the organization of two genomes, it is important not to draw conclusions on their modes of evolution from a single most parsimonious scenario explaining their differences. Better estimations can be obtained by sampling many different genomic rearrangement scenarios. For this problem, the Double Cut and Join (DCJ) model, while less relevant, is computationally easier than the Hannenhalli-Pevzner (HP) model. Indeed, in some special cases, the total number of DCJ sorting scenarios can be analytically calculated, and uniformly distributed random DCJ scenarios can be drawn in polynomial running time, while the complexity of counting the number of HP scenarios and sampling from the uniform distribution of their space is unknown, and conjectured to be #P-complete. Statistical methods, like Markov chain Monte Carlo (MCMC) for sampling from the uniform distribution of the most parsimonious or the Bayesian distribution of all possible HP scenarios are required. Results: We use the computational facilities of the DCJ model to draw a sampling of HP scenarios. It is based on a parallel MCMC method that cools down DCJ scenarios to HP scenarios. We introduce two theorems underlying the theoretical mixing properties of this parallel MCMC method. The method was tested on yeast and mammalian genomic data, and allowed us to provide estimates of the different modes of evolution in diverse lineages.

Original languageEnglish
Pages (from-to)3012-3019
Number of pages8
JournalBioinformatics
Volume26
Issue number24
DOIs
Publication statusPublished - Dec 2010

Fingerprint

Genomic Rearrangements
Markov Chains
Join
Monte Carlo Method
Markov processes
Sampling
Scenarios
Monte Carlo methods
Yeasts
Markov Chain Monte Carlo Methods
Sorting
Genome
Yeast
Uniform distribution
Statistical methods
Genes
Polynomials
Markov Chain Monte Carlo
Statistical method
Genomics

ASJC Scopus subject areas

  • Biochemistry
  • Molecular Biology
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Computational Mathematics
  • Statistics and Probability
  • Medicine(all)

Cite this

Bayesian sampling of genomic rearrangement scenarios via double cut and join. / Miklós, I.; Tannier, Eric.

In: Bioinformatics, Vol. 26, No. 24, 12.2010, p. 3012-3019.

Research output: Contribution to journalArticle

@article{9ddf3b3a52d543eeba558ce956c97f94,
title = "Bayesian sampling of genomic rearrangement scenarios via double cut and join",
abstract = "Motivation: When comparing the organization of two genomes, it is important not to draw conclusions on their modes of evolution from a single most parsimonious scenario explaining their differences. Better estimations can be obtained by sampling many different genomic rearrangement scenarios. For this problem, the Double Cut and Join (DCJ) model, while less relevant, is computationally easier than the Hannenhalli-Pevzner (HP) model. Indeed, in some special cases, the total number of DCJ sorting scenarios can be analytically calculated, and uniformly distributed random DCJ scenarios can be drawn in polynomial running time, while the complexity of counting the number of HP scenarios and sampling from the uniform distribution of their space is unknown, and conjectured to be #P-complete. Statistical methods, like Markov chain Monte Carlo (MCMC) for sampling from the uniform distribution of the most parsimonious or the Bayesian distribution of all possible HP scenarios are required. Results: We use the computational facilities of the DCJ model to draw a sampling of HP scenarios. It is based on a parallel MCMC method that cools down DCJ scenarios to HP scenarios. We introduce two theorems underlying the theoretical mixing properties of this parallel MCMC method. The method was tested on yeast and mammalian genomic data, and allowed us to provide estimates of the different modes of evolution in diverse lineages.",
author = "I. Mikl{\'o}s and Eric Tannier",
year = "2010",
month = "12",
doi = "10.1093/bioinformatics/btq574",
language = "English",
volume = "26",
pages = "3012--3019",
journal = "Bioinformatics",
issn = "1367-4803",
publisher = "Oxford University Press",
number = "24",

}

TY - JOUR

T1 - Bayesian sampling of genomic rearrangement scenarios via double cut and join

AU - Miklós, I.

AU - Tannier, Eric

PY - 2010/12

Y1 - 2010/12

N2 - Motivation: When comparing the organization of two genomes, it is important not to draw conclusions on their modes of evolution from a single most parsimonious scenario explaining their differences. Better estimations can be obtained by sampling many different genomic rearrangement scenarios. For this problem, the Double Cut and Join (DCJ) model, while less relevant, is computationally easier than the Hannenhalli-Pevzner (HP) model. Indeed, in some special cases, the total number of DCJ sorting scenarios can be analytically calculated, and uniformly distributed random DCJ scenarios can be drawn in polynomial running time, while the complexity of counting the number of HP scenarios and sampling from the uniform distribution of their space is unknown, and conjectured to be #P-complete. Statistical methods, like Markov chain Monte Carlo (MCMC) for sampling from the uniform distribution of the most parsimonious or the Bayesian distribution of all possible HP scenarios are required. Results: We use the computational facilities of the DCJ model to draw a sampling of HP scenarios. It is based on a parallel MCMC method that cools down DCJ scenarios to HP scenarios. We introduce two theorems underlying the theoretical mixing properties of this parallel MCMC method. The method was tested on yeast and mammalian genomic data, and allowed us to provide estimates of the different modes of evolution in diverse lineages.

AB - Motivation: When comparing the organization of two genomes, it is important not to draw conclusions on their modes of evolution from a single most parsimonious scenario explaining their differences. Better estimations can be obtained by sampling many different genomic rearrangement scenarios. For this problem, the Double Cut and Join (DCJ) model, while less relevant, is computationally easier than the Hannenhalli-Pevzner (HP) model. Indeed, in some special cases, the total number of DCJ sorting scenarios can be analytically calculated, and uniformly distributed random DCJ scenarios can be drawn in polynomial running time, while the complexity of counting the number of HP scenarios and sampling from the uniform distribution of their space is unknown, and conjectured to be #P-complete. Statistical methods, like Markov chain Monte Carlo (MCMC) for sampling from the uniform distribution of the most parsimonious or the Bayesian distribution of all possible HP scenarios are required. Results: We use the computational facilities of the DCJ model to draw a sampling of HP scenarios. It is based on a parallel MCMC method that cools down DCJ scenarios to HP scenarios. We introduce two theorems underlying the theoretical mixing properties of this parallel MCMC method. The method was tested on yeast and mammalian genomic data, and allowed us to provide estimates of the different modes of evolution in diverse lineages.

UR - http://www.scopus.com/inward/record.url?scp=79951749932&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79951749932&partnerID=8YFLogxK

U2 - 10.1093/bioinformatics/btq574

DO - 10.1093/bioinformatics/btq574

M3 - Article

C2 - 21037244

AN - SCOPUS:79951749932

VL - 26

SP - 3012

EP - 3019

JO - Bioinformatics

JF - Bioinformatics

SN - 1367-4803

IS - 24

ER -