A genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6,697 breast cancer patients

Lorinc Pongor, Máté Kormos, Christos Hatzis, Lajos Pusztai, András Szabó, B. Györffy

Research output: Contribution to journalArticle

27 Citations (Scopus)

Abstract

Background: The use of somatic mutations for predicting clinical outcome is difficult because a mutation can indirectly influence the function of many genes, and also because clinical follow-up is sparse in the relatively young next generation sequencing (NGS) databanks. Here we approach this problem by linking sequence databanks to well annotated gene-chip datasets, using a multigene transcriptomic fingerprint as a link between gene mutations and gene expression in breast cancer patients. Methods: The database consists of 763 NGS samples containing mutational status for 22,938 genes and RNA-seq data for 10,987 genes. The gene chip database contains 5,934 patients with 10,987 genes plus clinical characteristics. For the prediction, mutations present in a sample are first translated into a 'transcriptomic fingerprint' by running ROC analysis on mutation and RNA-seq data. Then correlation to survival is assessed by computing Cox regression for both up- and downregulated signatures. Results: According to this approach, the top driver oncogenes having a mutation prevalence over 5 % included AKT1, TRANK1, TRAPPC10, RPGR, COL6A2, RAPGEF4, ATG2B, CNTRL, NAA38, OSBPL10, POTEF, SCLT1, SUN1, VWDE, MTUS2, and PIK3CA, and the top tumor suppressor genes included PHEX, TP53, GGA3, RGS22, PXDNL, ARFGEF1, BRCA2, CHD8, GCC2, and ARMC4. The system was validated by computing correlation between RNA-seq and microarray data (r2 = 0.73, P <1E-16). Cross-validation using 20 genes with a prevalence of approximately 5 % confirmed analysis reproducibility. Conclusions: We established a pipeline enabling rapid clinical validation of a discovered mutation in a large breast cancer cohort. An online interface is available for evaluating any human gene mutation or combinations of maximum three such genes (http://www.g-2-o.com ).

Original languageEnglish
JournalGenome Medicine
Volume7
Issue number1
DOIs
Publication statusPublished - Oct 16 2015

Fingerprint

Oligonucleotide Array Sequence Analysis
Genotype
Genome
Breast Neoplasms
Mutation
Genes
Databases
Dermatoglyphics
RNA
Tumor Suppressor Genes
Oncogenes
ROC Curve
Running
Down-Regulation
Gene Expression
Survival

ASJC Scopus subject areas

  • Genetics(clinical)
  • Genetics
  • Molecular Biology
  • Molecular Medicine

Cite this

A genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6,697 breast cancer patients. / Pongor, Lorinc; Kormos, Máté; Hatzis, Christos; Pusztai, Lajos; Szabó, András; Györffy, B.

In: Genome Medicine, Vol. 7, No. 1, 16.10.2015.

Research output: Contribution to journalArticle

@article{44200fd5abd34f9f918387d0856fda49,
title = "A genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6,697 breast cancer patients",
abstract = "Background: The use of somatic mutations for predicting clinical outcome is difficult because a mutation can indirectly influence the function of many genes, and also because clinical follow-up is sparse in the relatively young next generation sequencing (NGS) databanks. Here we approach this problem by linking sequence databanks to well annotated gene-chip datasets, using a multigene transcriptomic fingerprint as a link between gene mutations and gene expression in breast cancer patients. Methods: The database consists of 763 NGS samples containing mutational status for 22,938 genes and RNA-seq data for 10,987 genes. The gene chip database contains 5,934 patients with 10,987 genes plus clinical characteristics. For the prediction, mutations present in a sample are first translated into a 'transcriptomic fingerprint' by running ROC analysis on mutation and RNA-seq data. Then correlation to survival is assessed by computing Cox regression for both up- and downregulated signatures. Results: According to this approach, the top driver oncogenes having a mutation prevalence over 5 {\%} included AKT1, TRANK1, TRAPPC10, RPGR, COL6A2, RAPGEF4, ATG2B, CNTRL, NAA38, OSBPL10, POTEF, SCLT1, SUN1, VWDE, MTUS2, and PIK3CA, and the top tumor suppressor genes included PHEX, TP53, GGA3, RGS22, PXDNL, ARFGEF1, BRCA2, CHD8, GCC2, and ARMC4. The system was validated by computing correlation between RNA-seq and microarray data (r2 = 0.73, P <1E-16). Cross-validation using 20 genes with a prevalence of approximately 5 {\%} confirmed analysis reproducibility. Conclusions: We established a pipeline enabling rapid clinical validation of a discovered mutation in a large breast cancer cohort. An online interface is available for evaluating any human gene mutation or combinations of maximum three such genes (http://www.g-2-o.com ).",
author = "Lorinc Pongor and M{\'a}t{\'e} Kormos and Christos Hatzis and Lajos Pusztai and Andr{\'a}s Szab{\'o} and B. Gy{\"o}rffy",
year = "2015",
month = "10",
day = "16",
doi = "10.1186/s13073-015-0228-1",
language = "English",
volume = "7",
journal = "Genome Medicine",
issn = "1756-994X",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - A genome-wide approach to link genotype to clinical outcome by utilizing next generation sequencing and gene chip data of 6,697 breast cancer patients

AU - Pongor, Lorinc

AU - Kormos, Máté

AU - Hatzis, Christos

AU - Pusztai, Lajos

AU - Szabó, András

AU - Györffy, B.

PY - 2015/10/16

Y1 - 2015/10/16

N2 - Background: The use of somatic mutations for predicting clinical outcome is difficult because a mutation can indirectly influence the function of many genes, and also because clinical follow-up is sparse in the relatively young next generation sequencing (NGS) databanks. Here we approach this problem by linking sequence databanks to well annotated gene-chip datasets, using a multigene transcriptomic fingerprint as a link between gene mutations and gene expression in breast cancer patients. Methods: The database consists of 763 NGS samples containing mutational status for 22,938 genes and RNA-seq data for 10,987 genes. The gene chip database contains 5,934 patients with 10,987 genes plus clinical characteristics. For the prediction, mutations present in a sample are first translated into a 'transcriptomic fingerprint' by running ROC analysis on mutation and RNA-seq data. Then correlation to survival is assessed by computing Cox regression for both up- and downregulated signatures. Results: According to this approach, the top driver oncogenes having a mutation prevalence over 5 % included AKT1, TRANK1, TRAPPC10, RPGR, COL6A2, RAPGEF4, ATG2B, CNTRL, NAA38, OSBPL10, POTEF, SCLT1, SUN1, VWDE, MTUS2, and PIK3CA, and the top tumor suppressor genes included PHEX, TP53, GGA3, RGS22, PXDNL, ARFGEF1, BRCA2, CHD8, GCC2, and ARMC4. The system was validated by computing correlation between RNA-seq and microarray data (r2 = 0.73, P <1E-16). Cross-validation using 20 genes with a prevalence of approximately 5 % confirmed analysis reproducibility. Conclusions: We established a pipeline enabling rapid clinical validation of a discovered mutation in a large breast cancer cohort. An online interface is available for evaluating any human gene mutation or combinations of maximum three such genes (http://www.g-2-o.com ).

AB - Background: The use of somatic mutations for predicting clinical outcome is difficult because a mutation can indirectly influence the function of many genes, and also because clinical follow-up is sparse in the relatively young next generation sequencing (NGS) databanks. Here we approach this problem by linking sequence databanks to well annotated gene-chip datasets, using a multigene transcriptomic fingerprint as a link between gene mutations and gene expression in breast cancer patients. Methods: The database consists of 763 NGS samples containing mutational status for 22,938 genes and RNA-seq data for 10,987 genes. The gene chip database contains 5,934 patients with 10,987 genes plus clinical characteristics. For the prediction, mutations present in a sample are first translated into a 'transcriptomic fingerprint' by running ROC analysis on mutation and RNA-seq data. Then correlation to survival is assessed by computing Cox regression for both up- and downregulated signatures. Results: According to this approach, the top driver oncogenes having a mutation prevalence over 5 % included AKT1, TRANK1, TRAPPC10, RPGR, COL6A2, RAPGEF4, ATG2B, CNTRL, NAA38, OSBPL10, POTEF, SCLT1, SUN1, VWDE, MTUS2, and PIK3CA, and the top tumor suppressor genes included PHEX, TP53, GGA3, RGS22, PXDNL, ARFGEF1, BRCA2, CHD8, GCC2, and ARMC4. The system was validated by computing correlation between RNA-seq and microarray data (r2 = 0.73, P <1E-16). Cross-validation using 20 genes with a prevalence of approximately 5 % confirmed analysis reproducibility. Conclusions: We established a pipeline enabling rapid clinical validation of a discovered mutation in a large breast cancer cohort. An online interface is available for evaluating any human gene mutation or combinations of maximum three such genes (http://www.g-2-o.com ).

UR - http://www.scopus.com/inward/record.url?scp=84945231686&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84945231686&partnerID=8YFLogxK

U2 - 10.1186/s13073-015-0228-1

DO - 10.1186/s13073-015-0228-1

M3 - Article

C2 - 26474971

AN - SCOPUS:84945231686

VL - 7

JO - Genome Medicine

JF - Genome Medicine

SN - 1756-994X

IS - 1

ER -