Current composite-feature classification methods do not outperform simple single-genes classifiers in breast cancer prognosis

Christine Staiger, Sidney Cadot, B. Györffy, Lodewyk F A Wessels, Gunnar W. Klau

Research output: Contribution to journalArticle

22 Citations (Scopus)

Abstract

Integrating gene expression data with secondary data such as pathway or protein-protein interaction data has been proposed as a promising approach for improved outcome prediction of cancer patients. Methods employing this approach usually aggregate the expression of genes into new composite features, while the secondary data guide this aggregation. Previous studies were limited to few data sets with a small number of patients. Moreover, each study used different data and evaluation procedures. This makes it difficult to objectively assess the gain in classification performance. Here we introduce the Amsterdam Classification Evaluation Suite (ACES). ACES is a Python package to objectively evaluate classification and feature-selection methods and contains methods for pooling and normalizing Affymetrix microarrays from different studies. It is simple to use and therefore facilitates the comparison of new approaches to best-in-class approaches. In addition to the methods described in our earlier study (Staiger et al., 2012), we have included two prominent prognostic gene signatures specific for breast cancer outcome, one more composite feature selection method and two network-based gene ranking methods. Employing the evaluation pipeline we show that current composite-feature classification methods do not outperform simple single-genes classifiers in predicting outcome in breast cancer. Furthermore, we find that also the stability of features across different data sets is not higher for composite features. Most stunningly, we observe that prediction performances are not affected when extracting features from randomized PPI networks.

Original languageEnglish
Article number00289
JournalFrontiers in Genetics
Volume4
Issue numberDEC
DOIs
Publication statusPublished - 2013

Fingerprint

Breast Neoplasms
Genes
Boidae
Gene Expression
Gene Regulatory Networks
Proteins
Neoplasms
Datasets

Keywords

  • Breast cancer
  • Classification
  • Evaluation
  • Feature selection
  • Networks
  • Outcome prediction

ASJC Scopus subject areas

  • Genetics
  • Molecular Medicine
  • Genetics(clinical)

Cite this

Current composite-feature classification methods do not outperform simple single-genes classifiers in breast cancer prognosis. / Staiger, Christine; Cadot, Sidney; Györffy, B.; Wessels, Lodewyk F A; Klau, Gunnar W.

In: Frontiers in Genetics, Vol. 4, No. DEC, 00289, 2013.

Research output: Contribution to journalArticle

Staiger, Christine ; Cadot, Sidney ; Györffy, B. ; Wessels, Lodewyk F A ; Klau, Gunnar W. / Current composite-feature classification methods do not outperform simple single-genes classifiers in breast cancer prognosis. In: Frontiers in Genetics. 2013 ; Vol. 4, No. DEC.
@article{289cbead78b441ce8e8c629dc31e50ac,
title = "Current composite-feature classification methods do not outperform simple single-genes classifiers in breast cancer prognosis",
abstract = "Integrating gene expression data with secondary data such as pathway or protein-protein interaction data has been proposed as a promising approach for improved outcome prediction of cancer patients. Methods employing this approach usually aggregate the expression of genes into new composite features, while the secondary data guide this aggregation. Previous studies were limited to few data sets with a small number of patients. Moreover, each study used different data and evaluation procedures. This makes it difficult to objectively assess the gain in classification performance. Here we introduce the Amsterdam Classification Evaluation Suite (ACES). ACES is a Python package to objectively evaluate classification and feature-selection methods and contains methods for pooling and normalizing Affymetrix microarrays from different studies. It is simple to use and therefore facilitates the comparison of new approaches to best-in-class approaches. In addition to the methods described in our earlier study (Staiger et al., 2012), we have included two prominent prognostic gene signatures specific for breast cancer outcome, one more composite feature selection method and two network-based gene ranking methods. Employing the evaluation pipeline we show that current composite-feature classification methods do not outperform simple single-genes classifiers in predicting outcome in breast cancer. Furthermore, we find that also the stability of features across different data sets is not higher for composite features. Most stunningly, we observe that prediction performances are not affected when extracting features from randomized PPI networks.",
keywords = "Breast cancer, Classification, Evaluation, Feature selection, Networks, Outcome prediction",
author = "Christine Staiger and Sidney Cadot and B. Gy{\"o}rffy and Wessels, {Lodewyk F A} and Klau, {Gunnar W.}",
year = "2013",
doi = "10.3389/fgene.2013.00289",
language = "English",
volume = "4",
journal = "Frontiers in Genetics",
issn = "1664-8021",
publisher = "Frontiers Media S. A.",
number = "DEC",

}

TY - JOUR

T1 - Current composite-feature classification methods do not outperform simple single-genes classifiers in breast cancer prognosis

AU - Staiger, Christine

AU - Cadot, Sidney

AU - Györffy, B.

AU - Wessels, Lodewyk F A

AU - Klau, Gunnar W.

PY - 2013

Y1 - 2013

N2 - Integrating gene expression data with secondary data such as pathway or protein-protein interaction data has been proposed as a promising approach for improved outcome prediction of cancer patients. Methods employing this approach usually aggregate the expression of genes into new composite features, while the secondary data guide this aggregation. Previous studies were limited to few data sets with a small number of patients. Moreover, each study used different data and evaluation procedures. This makes it difficult to objectively assess the gain in classification performance. Here we introduce the Amsterdam Classification Evaluation Suite (ACES). ACES is a Python package to objectively evaluate classification and feature-selection methods and contains methods for pooling and normalizing Affymetrix microarrays from different studies. It is simple to use and therefore facilitates the comparison of new approaches to best-in-class approaches. In addition to the methods described in our earlier study (Staiger et al., 2012), we have included two prominent prognostic gene signatures specific for breast cancer outcome, one more composite feature selection method and two network-based gene ranking methods. Employing the evaluation pipeline we show that current composite-feature classification methods do not outperform simple single-genes classifiers in predicting outcome in breast cancer. Furthermore, we find that also the stability of features across different data sets is not higher for composite features. Most stunningly, we observe that prediction performances are not affected when extracting features from randomized PPI networks.

AB - Integrating gene expression data with secondary data such as pathway or protein-protein interaction data has been proposed as a promising approach for improved outcome prediction of cancer patients. Methods employing this approach usually aggregate the expression of genes into new composite features, while the secondary data guide this aggregation. Previous studies were limited to few data sets with a small number of patients. Moreover, each study used different data and evaluation procedures. This makes it difficult to objectively assess the gain in classification performance. Here we introduce the Amsterdam Classification Evaluation Suite (ACES). ACES is a Python package to objectively evaluate classification and feature-selection methods and contains methods for pooling and normalizing Affymetrix microarrays from different studies. It is simple to use and therefore facilitates the comparison of new approaches to best-in-class approaches. In addition to the methods described in our earlier study (Staiger et al., 2012), we have included two prominent prognostic gene signatures specific for breast cancer outcome, one more composite feature selection method and two network-based gene ranking methods. Employing the evaluation pipeline we show that current composite-feature classification methods do not outperform simple single-genes classifiers in predicting outcome in breast cancer. Furthermore, we find that also the stability of features across different data sets is not higher for composite features. Most stunningly, we observe that prediction performances are not affected when extracting features from randomized PPI networks.

KW - Breast cancer

KW - Classification

KW - Evaluation

KW - Feature selection

KW - Networks

KW - Outcome prediction

UR - http://www.scopus.com/inward/record.url?scp=84892385433&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84892385433&partnerID=8YFLogxK

U2 - 10.3389/fgene.2013.00289

DO - 10.3389/fgene.2013.00289

M3 - Article

VL - 4

JO - Frontiers in Genetics

JF - Frontiers in Genetics

SN - 1664-8021

IS - DEC

M1 - 00289

ER -