BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC

Rahul Satija, Ádm Novk, I. Miklós, Rune Lyngsø, Jotun Hein

Research output: Contribution to journalArticle

20 Citations (Scopus)

Abstract

Background. We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. Results. We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the -globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. Conclusion. BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from http://www.stats.ox.ac.uk/∼satija/BigFoot/.

Original languageEnglish
Article number217
JournalBMC Evolutionary Biology
Volume9
Issue number1
DOIs
Publication statusPublished - 2009

Fingerprint

Markov chain
phylogenetics
phylogeny
uncertainty
dynamic programming
oxen
prediction
sequence alignment
Drosophila
genes
methodology
vertebrates
software
alignment
gene
testing
vertebrate
sampling

ASJC Scopus subject areas

  • Ecology, Evolution, Behavior and Systematics

Cite this

BigFoot : Bayesian alignment and phylogenetic footprinting with MCMC. / Satija, Rahul; Novk, Ádm; Miklós, I.; Lyngsø, Rune; Hein, Jotun.

In: BMC Evolutionary Biology, Vol. 9, No. 1, 217, 2009.

Research output: Contribution to journalArticle

Satija, Rahul ; Novk, Ádm ; Miklós, I. ; Lyngsø, Rune ; Hein, Jotun. / BigFoot : Bayesian alignment and phylogenetic footprinting with MCMC. In: BMC Evolutionary Biology. 2009 ; Vol. 9, No. 1.
@article{a5a8874721f24ca3b145f93571423c99,
title = "BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC",
abstract = "Background. We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. Results. We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the -globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. Conclusion. BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from http://www.stats.ox.ac.uk/∼satija/BigFoot/.",
author = "Rahul Satija and {\'A}dm Novk and I. Mikl{\'o}s and Rune Lyngs{\o} and Jotun Hein",
year = "2009",
doi = "10.1186/1471-2148-9-217",
language = "English",
volume = "9",
journal = "BMC Evolutionary Biology",
issn = "1471-2148",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - BigFoot

T2 - Bayesian alignment and phylogenetic footprinting with MCMC

AU - Satija, Rahul

AU - Novk, Ádm

AU - Miklós, I.

AU - Lyngsø, Rune

AU - Hein, Jotun

PY - 2009

Y1 - 2009

N2 - Background. We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. Results. We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the -globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. Conclusion. BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from http://www.stats.ox.ac.uk/∼satija/BigFoot/.

AB - Background. We have previously combined statistical alignment and phylogenetic footprinting to detect conserved functional elements without assuming a fixed alignment. Considering a probability-weighted distribution of alignments removes sensitivity to alignment errors, properly accommodates regions of alignment uncertainty, and increases the accuracy of functional element prediction. Our method utilized standard dynamic programming hidden markov model algorithms to analyze up to four sequences. Results. We present a novel approach, implemented in the software package BigFoot, for performing phylogenetic footprinting on greater numbers of sequences. We have developed a Markov chain Monte Carlo (MCMC) approach which samples both sequence alignments and locations of slowly evolving regions. We implement our method as an extension of the existing StatAlign software package and test it on well-annotated regions controlling the expression of the even-skipped gene in Drosophila and the -globin gene in vertebrates. The results exhibit how adding additional sequences to the analysis has the potential to improve the accuracy of functional predictions, and demonstrate how BigFoot outperforms existing alignment-based phylogenetic footprinting techniques. Conclusion. BigFoot extends a combined alignment and phylogenetic footprinting approach to analyze larger amounts of sequence data using MCMC. Our approach is robust to alignment error and uncertainty and can be applied to a variety of biological datasets. The source code and documentation are publicly available for download from http://www.stats.ox.ac.uk/∼satija/BigFoot/.

UR - http://www.scopus.com/inward/record.url?scp=70349205853&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70349205853&partnerID=8YFLogxK

U2 - 10.1186/1471-2148-9-217

DO - 10.1186/1471-2148-9-217

M3 - Article

C2 - 19715598

AN - SCOPUS:70349205853

VL - 9

JO - BMC Evolutionary Biology

JF - BMC Evolutionary Biology

SN - 1471-2148

IS - 1

M1 - 217

ER -