Validation subset selections for extrapolation oriented QSPAR models

Csaba Szántai-Kis, István Kövesdi, György Kéri, László Orfi

Research output: Contribution to journalArticle

7 Citations (Scopus)


One of the most important features of QSPAR models is their predictive ability. The predictive ability of QSPAR models should be checked by external validation. In this work we examined three different types of external validation set selection methods for their usefulness in in-silico screening. The usefulness of the selection methods was studied in such a way that: 1) We generated thousands of QSPR models and stored them in 'model banks'. 2) We selected a final top model from the model banks based on three different validation set selection methods. 3) We predicted large data sets, which we called 'chemical universe sets', and calculated the corresponding SEPs. The models were generated from small fractions of the available water solubility data during a GA Variable Subset Selection procedure. The external validation sets were constructed by random selections, uniformly distributed selections or by perimeter-oriented selections. We found that the best performing models on the perimeter-oriented external validation sets usually gave the best validation results when the remaining part of the available data was overwhelmingly large, i.e., when the model had to make a lot of extrapolations. We also compared the top final models obtained from external validation set selection methods in three independent and different sizes of 'chemical universe sets'.

Original languageEnglish
Pages (from-to)37-43
Number of pages7
JournalMolecular Diversity
Issue number1
Publication statusPublished - Dec 1 2003



  • External validation
  • Extrapolation
  • Perimeter-oriented selection
  • Prediction
  • QSAR

ASJC Scopus subject areas

  • Catalysis
  • Information Systems
  • Molecular Biology
  • Drug Discovery
  • Physical and Theoretical Chemistry
  • Organic Chemistry
  • Inorganic Chemistry

Cite this