GFam: A platform for automatic annotation of gene families

Rajkumar Sasidharan, Tamás Nepusz, David Swarbreck, Eva Huala, Alberto Paccanaro

Research output: Contribution to journalArticle

1 Citation (Scopus)


We have developed GFam, a platform for automatic annotation of gene/protein families. GFam provides a framework for genome initiatives and model organism resources to build domain-based families, derive meaningful functional labels and offers a seamless approach to propagate functional annotation across periodic genome updates. GFam is a hybrid approach that uses a greedy algorithm to chain component domains from InterPro annotation provided by its 12 member resources followed by a sequence-based connected component analysis of un-annotated sequence regions to derive consensus domain architecture for each sequence and subsequently generate families based on common architectures. Our integrated approach increases sequence coverage by 7.2 percentage points and residue coverage by 14.6 percentage points higher than the coverage relative to the best single-constituent database within InterPro for the proteome of Arabidopsis. The true power of GFam lies in maximizing annotation provided by the different InterPro data sources that offer resource-specific coverage for different regions of a sequence. GFam's capability to capture higher sequence and residue coverage can be useful for genome annotation, comparative genomics and functional studies. GFam is a general-purpose software and can be used for any collection of protein sequences. The software is open source and can be obtained from

Original languageEnglish
Pages (from-to)e152
JournalNucleic acids research
Issue number19
Publication statusPublished - Oct 1 2012


ASJC Scopus subject areas

  • Genetics

Cite this

Sasidharan, R., Nepusz, T., Swarbreck, D., Huala, E., & Paccanaro, A. (2012). GFam: A platform for automatic annotation of gene families. Nucleic acids research, 40(19), e152.