Streamlining and large ancestral genomes in archaea inferred with a phylogenetic birth-and-death model

Miklós Csurös, István Miklós

Research output: Contribution to journalArticle

81 Citations (Scopus)

Abstract

Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire.

Original languageEnglish
Pages (from-to)2087-2095
Number of pages9
JournalMolecular Biology and Evolution
Volume26
Issue number9
DOIs
Publication statusPublished - Sep 2009

Fingerprint

Archaea
genome
Parturition
Genome
death
phylogenetics
gene
phylogeny
Genes
Archaeal Genome
genes
ancestry
Horizontal Gene Transfer
Gene Duplication
probabilistic models
history
family size
Statistical Models
Phylogeny
gene transfer

Keywords

  • Gene content evolution
  • Last archaeal common ancestor
  • Maximum likelihood

ASJC Scopus subject areas

  • Genetics
  • Molecular Biology
  • Ecology, Evolution, Behavior and Systematics

Cite this

Streamlining and large ancestral genomes in archaea inferred with a phylogenetic birth-and-death model. / Csurös, Miklós; Miklós, István.

In: Molecular Biology and Evolution, Vol. 26, No. 9, 09.2009, p. 2087-2095.

Research output: Contribution to journalArticle

@article{8bd46ffd1308430c8054c8b2d82957e5,
title = "Streamlining and large ancestral genomes in archaea inferred with a phylogenetic birth-and-death model",
abstract = "Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire.",
keywords = "Gene content evolution, Last archaeal common ancestor, Maximum likelihood",
author = "Mikl{\'o}s Csur{\"o}s and Istv{\'a}n Mikl{\'o}s",
year = "2009",
month = "9",
doi = "10.1093/molbev/msp123",
language = "English",
volume = "26",
pages = "2087--2095",
journal = "Molecular Biology and Evolution",
issn = "0737-4038",
publisher = "Oxford University Press",
number = "9",

}

TY - JOUR

T1 - Streamlining and large ancestral genomes in archaea inferred with a phylogenetic birth-and-death model

AU - Csurös, Miklós

AU - Miklós, István

PY - 2009/9

Y1 - 2009/9

N2 - Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire.

AB - Homologous genes originate from a common ancestor through vertical inheritance, duplication, or horizontal gene transfer. Entire homolog families spawned by a single ancestral gene can be identified across multiple genomes based on protein sequence similarity. The sequences, however, do not always reveal conclusively the history of large families. To study the evolution of complete gene repertoires, we propose here a mathematical framework that does not rely on resolved gene family histories. We show that so-called phylogenetic profiles, formed by family sizes across multiple genomes, are sufficient to infer principal evolutionary trends. The main novelty in our approach is an efficient algorithm to compute the likelihood of a phylogenetic profile in a model of birth-and-death processes acting on a phylogeny. We examine known gene families in 28 archaeal genomes using a probabilistic model that involves lineage- and family-specific components of gene acquisition, duplication, and loss. The model enables us to consider all possible histories when inferring statistics about archaeal evolution. According to our reconstruction, most lineages are characterized by a net loss of gene families. Major increases in gene repertoire have occurred only a few times. Our reconstruction underlines the importance of persistent streamlining processes in shaping genome composition in Archaea. It also suggests that early archaeal genomes were as complex as typical modern ones, and even show signs, in the case of the methanogenic ancestor, of an extremely large gene repertoire.

KW - Gene content evolution

KW - Last archaeal common ancestor

KW - Maximum likelihood

UR - http://www.scopus.com/inward/record.url?scp=68949207909&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=68949207909&partnerID=8YFLogxK

U2 - 10.1093/molbev/msp123

DO - 10.1093/molbev/msp123

M3 - Article

C2 - 19570746

AN - SCOPUS:68949207909

VL - 26

SP - 2087

EP - 2095

JO - Molecular Biology and Evolution

JF - Molecular Biology and Evolution

SN - 0737-4038

IS - 9

ER -