Validation subset selections for extrapolation oriented QSPAR models

Csaba Szántai-Kis, István Kövesdi, G. Kéri, L. Őrfi

Research output: Contribution to journalArticle

7 Citations (Scopus)

Abstract

One of the most important features of QSPAR models is their predictive ability. The predictive ability of QSPAR models should be checked by external validation. In this work we examined three different types of external validation set selection methods for their usefulness in in-silico screening. The usefulness of the selection methods was studied in such a way that: 1) We generated thousands of QSPR models and stored them in 'model banks'. 2) We selected a final top model from the model banks based on three different validation set selection methods. 3) We predicted large data sets, which we called 'chemical universe sets', and calculated the corresponding SEPs. The models were generated from small fractions of the available water solubility data during a GA Variable Subset Selection procedure. The external validation sets were constructed by random selections, uniformly distributed selections or by perimeter-oriented selections. We found that the best performing models on the perimeter-oriented external validation sets usually gave the best validation results when the remaining part of the available data was overwhelmingly large, i.e., when the model had to make a lot of extrapolations. We also compared the top final models obtained from external validation set selection methods in three independent and different sizes of 'chemical universe sets'.

Original languageEnglish
Pages (from-to)37-43
Number of pages7
JournalMolecular Diversity
Volume7
Issue number1
DOIs
Publication statusPublished - 2003

Fingerprint

Set theory
Extrapolation
set theory
extrapolation
Computer Simulation
Solubility
universe
Water
Screening
screening
solubility

Keywords

  • External validation
  • Extrapolation
  • Perimeter-oriented selection
  • Prediction
  • QSAR

ASJC Scopus subject areas

  • Chemistry (miscellaneous)
  • Drug Discovery
  • Organic Chemistry

Cite this

Validation subset selections for extrapolation oriented QSPAR models. / Szántai-Kis, Csaba; Kövesdi, István; Kéri, G.; Őrfi, L.

In: Molecular Diversity, Vol. 7, No. 1, 2003, p. 37-43.

Research output: Contribution to journalArticle

Szántai-Kis, Csaba ; Kövesdi, István ; Kéri, G. ; Őrfi, L. / Validation subset selections for extrapolation oriented QSPAR models. In: Molecular Diversity. 2003 ; Vol. 7, No. 1. pp. 37-43.
@article{d3dcba42005744a88feb5ba1cd91a2fa,
title = "Validation subset selections for extrapolation oriented QSPAR models",
abstract = "One of the most important features of QSPAR models is their predictive ability. The predictive ability of QSPAR models should be checked by external validation. In this work we examined three different types of external validation set selection methods for their usefulness in in-silico screening. The usefulness of the selection methods was studied in such a way that: 1) We generated thousands of QSPR models and stored them in 'model banks'. 2) We selected a final top model from the model banks based on three different validation set selection methods. 3) We predicted large data sets, which we called 'chemical universe sets', and calculated the corresponding SEPs. The models were generated from small fractions of the available water solubility data during a GA Variable Subset Selection procedure. The external validation sets were constructed by random selections, uniformly distributed selections or by perimeter-oriented selections. We found that the best performing models on the perimeter-oriented external validation sets usually gave the best validation results when the remaining part of the available data was overwhelmingly large, i.e., when the model had to make a lot of extrapolations. We also compared the top final models obtained from external validation set selection methods in three independent and different sizes of 'chemical universe sets'.",
keywords = "External validation, Extrapolation, Perimeter-oriented selection, Prediction, QSAR",
author = "Csaba Sz{\'a}ntai-Kis and Istv{\'a}n K{\"o}vesdi and G. K{\'e}ri and L. Őrfi",
year = "2003",
doi = "10.1023/B:MODI.0000006538.99122.00",
language = "English",
volume = "7",
pages = "37--43",
journal = "Molecular Diversity",
issn = "1381-1991",
publisher = "Springer Netherlands",
number = "1",

}

TY - JOUR

T1 - Validation subset selections for extrapolation oriented QSPAR models

AU - Szántai-Kis, Csaba

AU - Kövesdi, István

AU - Kéri, G.

AU - Őrfi, L.

PY - 2003

Y1 - 2003

N2 - One of the most important features of QSPAR models is their predictive ability. The predictive ability of QSPAR models should be checked by external validation. In this work we examined three different types of external validation set selection methods for their usefulness in in-silico screening. The usefulness of the selection methods was studied in such a way that: 1) We generated thousands of QSPR models and stored them in 'model banks'. 2) We selected a final top model from the model banks based on three different validation set selection methods. 3) We predicted large data sets, which we called 'chemical universe sets', and calculated the corresponding SEPs. The models were generated from small fractions of the available water solubility data during a GA Variable Subset Selection procedure. The external validation sets were constructed by random selections, uniformly distributed selections or by perimeter-oriented selections. We found that the best performing models on the perimeter-oriented external validation sets usually gave the best validation results when the remaining part of the available data was overwhelmingly large, i.e., when the model had to make a lot of extrapolations. We also compared the top final models obtained from external validation set selection methods in three independent and different sizes of 'chemical universe sets'.

AB - One of the most important features of QSPAR models is their predictive ability. The predictive ability of QSPAR models should be checked by external validation. In this work we examined three different types of external validation set selection methods for their usefulness in in-silico screening. The usefulness of the selection methods was studied in such a way that: 1) We generated thousands of QSPR models and stored them in 'model banks'. 2) We selected a final top model from the model banks based on three different validation set selection methods. 3) We predicted large data sets, which we called 'chemical universe sets', and calculated the corresponding SEPs. The models were generated from small fractions of the available water solubility data during a GA Variable Subset Selection procedure. The external validation sets were constructed by random selections, uniformly distributed selections or by perimeter-oriented selections. We found that the best performing models on the perimeter-oriented external validation sets usually gave the best validation results when the remaining part of the available data was overwhelmingly large, i.e., when the model had to make a lot of extrapolations. We also compared the top final models obtained from external validation set selection methods in three independent and different sizes of 'chemical universe sets'.

KW - External validation

KW - Extrapolation

KW - Perimeter-oriented selection

KW - Prediction

KW - QSAR

UR - http://www.scopus.com/inward/record.url?scp=18844475357&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=18844475357&partnerID=8YFLogxK

U2 - 10.1023/B:MODI.0000006538.99122.00

DO - 10.1023/B:MODI.0000006538.99122.00

M3 - Article

VL - 7

SP - 37

EP - 43

JO - Molecular Diversity

JF - Molecular Diversity

SN - 1381-1991

IS - 1

ER -