Quality control of gene predictions

A. Nagy, H. Hegyi, K. Farkas, H. Tordai, E. Kozma, L. Bányai, L. Patthy

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

A recent study has systematically compared the performance of various computational methods to predict human protein-coding genes (Guigó et al. 2006). In this study a set of well annotated ENCODE sequences were blind-analyzed with different gene finding programs and the predictions obtained were compared with the annotations. Predictions were analyzed at the nucleotide, exon, transcript and gene levels to evaluate how well they were able to reproduce the annotation. These studies have revealed that none of the strategies produced perfect predictions but prediction methods that rely on mRNA and protein sequences and those that used combined information (including expressed sequence information) were generally the most accurate. The dual- or multiple genome methods were less accurate, although performing better than the single genome ab initio prediction methods. Importantly, at the nucleotide level no prediction method correctly identified greater than ∼90% of nucleotides and at the transcript level (the most stringent criterion) no prediction method correctly identified greater than 45% of the coding transcripts.

Original languageEnglish
Title of host publicationModern Genome Annotation: The Biosapiens Network
PublisherSpringer-Verlag Wien
Pages41-52
Number of pages12
ISBN (Print)9783211751237, 9783211751220
DOIs
Publication statusPublished - 2008

Fingerprint

Quality Control
Genes
Nucleotides
Genome
Exons
Proteins
Messenger RNA

ASJC Scopus subject areas

  • Medicine(all)

Cite this

Nagy, A., Hegyi, H., Farkas, K., Tordai, H., Kozma, E., Bányai, L., & Patthy, L. (2008). Quality control of gene predictions. In Modern Genome Annotation: The Biosapiens Network (pp. 41-52). Springer-Verlag Wien. https://doi.org/10.1007/978-3-211-75123-7_3

Quality control of gene predictions. / Nagy, A.; Hegyi, H.; Farkas, K.; Tordai, H.; Kozma, E.; Bányai, L.; Patthy, L.

Modern Genome Annotation: The Biosapiens Network. Springer-Verlag Wien, 2008. p. 41-52.

Research output: Chapter in Book/Report/Conference proceedingChapter

Nagy, A, Hegyi, H, Farkas, K, Tordai, H, Kozma, E, Bányai, L & Patthy, L 2008, Quality control of gene predictions. in Modern Genome Annotation: The Biosapiens Network. Springer-Verlag Wien, pp. 41-52. https://doi.org/10.1007/978-3-211-75123-7_3
Nagy A, Hegyi H, Farkas K, Tordai H, Kozma E, Bányai L et al. Quality control of gene predictions. In Modern Genome Annotation: The Biosapiens Network. Springer-Verlag Wien. 2008. p. 41-52 https://doi.org/10.1007/978-3-211-75123-7_3
Nagy, A. ; Hegyi, H. ; Farkas, K. ; Tordai, H. ; Kozma, E. ; Bányai, L. ; Patthy, L. / Quality control of gene predictions. Modern Genome Annotation: The Biosapiens Network. Springer-Verlag Wien, 2008. pp. 41-52
@inbook{e1e76b4b638a43ba9c96c6af8c0a1626,
title = "Quality control of gene predictions",
abstract = "A recent study has systematically compared the performance of various computational methods to predict human protein-coding genes (Guig{\'o} et al. 2006). In this study a set of well annotated ENCODE sequences were blind-analyzed with different gene finding programs and the predictions obtained were compared with the annotations. Predictions were analyzed at the nucleotide, exon, transcript and gene levels to evaluate how well they were able to reproduce the annotation. These studies have revealed that none of the strategies produced perfect predictions but prediction methods that rely on mRNA and protein sequences and those that used combined information (including expressed sequence information) were generally the most accurate. The dual- or multiple genome methods were less accurate, although performing better than the single genome ab initio prediction methods. Importantly, at the nucleotide level no prediction method correctly identified greater than ∼90{\%} of nucleotides and at the transcript level (the most stringent criterion) no prediction method correctly identified greater than 45{\%} of the coding transcripts.",
author = "A. Nagy and H. Hegyi and K. Farkas and H. Tordai and E. Kozma and L. B{\'a}nyai and L. Patthy",
year = "2008",
doi = "10.1007/978-3-211-75123-7_3",
language = "English",
isbn = "9783211751237",
pages = "41--52",
booktitle = "Modern Genome Annotation: The Biosapiens Network",
publisher = "Springer-Verlag Wien",

}

TY - CHAP

T1 - Quality control of gene predictions

AU - Nagy, A.

AU - Hegyi, H.

AU - Farkas, K.

AU - Tordai, H.

AU - Kozma, E.

AU - Bányai, L.

AU - Patthy, L.

PY - 2008

Y1 - 2008

N2 - A recent study has systematically compared the performance of various computational methods to predict human protein-coding genes (Guigó et al. 2006). In this study a set of well annotated ENCODE sequences were blind-analyzed with different gene finding programs and the predictions obtained were compared with the annotations. Predictions were analyzed at the nucleotide, exon, transcript and gene levels to evaluate how well they were able to reproduce the annotation. These studies have revealed that none of the strategies produced perfect predictions but prediction methods that rely on mRNA and protein sequences and those that used combined information (including expressed sequence information) were generally the most accurate. The dual- or multiple genome methods were less accurate, although performing better than the single genome ab initio prediction methods. Importantly, at the nucleotide level no prediction method correctly identified greater than ∼90% of nucleotides and at the transcript level (the most stringent criterion) no prediction method correctly identified greater than 45% of the coding transcripts.

AB - A recent study has systematically compared the performance of various computational methods to predict human protein-coding genes (Guigó et al. 2006). In this study a set of well annotated ENCODE sequences were blind-analyzed with different gene finding programs and the predictions obtained were compared with the annotations. Predictions were analyzed at the nucleotide, exon, transcript and gene levels to evaluate how well they were able to reproduce the annotation. These studies have revealed that none of the strategies produced perfect predictions but prediction methods that rely on mRNA and protein sequences and those that used combined information (including expressed sequence information) were generally the most accurate. The dual- or multiple genome methods were less accurate, although performing better than the single genome ab initio prediction methods. Importantly, at the nucleotide level no prediction method correctly identified greater than ∼90% of nucleotides and at the transcript level (the most stringent criterion) no prediction method correctly identified greater than 45% of the coding transcripts.

UR - http://www.scopus.com/inward/record.url?scp=84920149092&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84920149092&partnerID=8YFLogxK

U2 - 10.1007/978-3-211-75123-7_3

DO - 10.1007/978-3-211-75123-7_3

M3 - Chapter

AN - SCOPUS:84920149092

SN - 9783211751237

SN - 9783211751220

SP - 41

EP - 52

BT - Modern Genome Annotation: The Biosapiens Network

PB - Springer-Verlag Wien

ER -