Comparison of predictive ability of water solubility QSPR models generated by MLR, PLS and ANN methods

Daniel Erös, G. Kéri, István Kövesdi, Csaba Szántai-Kis, György Mészáros, L. Őrfi

Research output: Contribution to journalArticle

41 Citations (Scopus)

Abstract

ADME/Tox computational screening is one of the most hot topics of modern drug research. About one half of the potential drug candidates fail because of poor ADME/Tox properties. Since the experimental determination of water solubility is time-consuming also, reliable computational predictions are needed for the pre-selection of acceptable "drug-like" compounds from diverse combinatorial libraries. Recently many successful attempts were made for predicting water solubility of compounds. A comprehensive review of previously developed water solubility calculation methods is presented here, followed by the description of the solubility prediction method designed and used in our laboratory. We have selected carefully 1381 compounds from scientific publications in a unified database and used this dataset in the calculations. The externally validated models were based on calculated descriptors only. The aim of model optimization was to improve repeated evaluations statistics of the predictions and effective descriptor scoring functions were used to facilitate quick generation of multiple linear regression analysis (MLR), partial least squares method (PLS) and artificial neural network (ANN) models with optimal predicting ability. Standard error of prediction of the best model generated with ANN (with 39-7-1 network structure) was 0.72 in logS units while the cross validated squared correlation coefficient (Q2) was better than 0.85. These values give a good chance for successful pre-selection of screening compounds from virtual libraries, based on the predicted water solubility.

Original languageEnglish
Pages (from-to)167-177
Number of pages11
JournalMini-Reviews in Medicinal Chemistry
Volume4
Issue number2
DOIs
Publication statusPublished - Feb 2004

Fingerprint

Least-Squares Analysis
Linear regression
Regression analysis
Solubility
Linear Models
Regression Analysis
Neural networks
Water
Screening
Digital Libraries
Pharmaceutical Preparations
Neural Networks (Computer)
Publications
Statistics
Databases
Research

Keywords

  • ADME
  • ANN
  • External validation
  • MLR
  • PLS
  • QSPR
  • Virtual screening
  • Water solubility

ASJC Scopus subject areas

  • Medicine (miscellaneous)
  • Chemistry(all)
  • Pharmacology

Cite this

Comparison of predictive ability of water solubility QSPR models generated by MLR, PLS and ANN methods. / Erös, Daniel; Kéri, G.; Kövesdi, István; Szántai-Kis, Csaba; Mészáros, György; Őrfi, L.

In: Mini-Reviews in Medicinal Chemistry, Vol. 4, No. 2, 02.2004, p. 167-177.

Research output: Contribution to journalArticle

Erös, Daniel ; Kéri, G. ; Kövesdi, István ; Szántai-Kis, Csaba ; Mészáros, György ; Őrfi, L. / Comparison of predictive ability of water solubility QSPR models generated by MLR, PLS and ANN methods. In: Mini-Reviews in Medicinal Chemistry. 2004 ; Vol. 4, No. 2. pp. 167-177.
@article{9bf056ca19aa487b93d64ba7e70c9b84,
title = "Comparison of predictive ability of water solubility QSPR models generated by MLR, PLS and ANN methods",
abstract = "ADME/Tox computational screening is one of the most hot topics of modern drug research. About one half of the potential drug candidates fail because of poor ADME/Tox properties. Since the experimental determination of water solubility is time-consuming also, reliable computational predictions are needed for the pre-selection of acceptable {"}drug-like{"} compounds from diverse combinatorial libraries. Recently many successful attempts were made for predicting water solubility of compounds. A comprehensive review of previously developed water solubility calculation methods is presented here, followed by the description of the solubility prediction method designed and used in our laboratory. We have selected carefully 1381 compounds from scientific publications in a unified database and used this dataset in the calculations. The externally validated models were based on calculated descriptors only. The aim of model optimization was to improve repeated evaluations statistics of the predictions and effective descriptor scoring functions were used to facilitate quick generation of multiple linear regression analysis (MLR), partial least squares method (PLS) and artificial neural network (ANN) models with optimal predicting ability. Standard error of prediction of the best model generated with ANN (with 39-7-1 network structure) was 0.72 in logS units while the cross validated squared correlation coefficient (Q2) was better than 0.85. These values give a good chance for successful pre-selection of screening compounds from virtual libraries, based on the predicted water solubility.",
keywords = "ADME, ANN, External validation, MLR, PLS, QSPR, Virtual screening, Water solubility",
author = "Daniel Er{\"o}s and G. K{\'e}ri and Istv{\'a}n K{\"o}vesdi and Csaba Sz{\'a}ntai-Kis and Gy{\"o}rgy M{\'e}sz{\'a}ros and L. Őrfi",
year = "2004",
month = "2",
doi = "10.2174/1389557043487466",
language = "English",
volume = "4",
pages = "167--177",
journal = "Mini-Reviews in Medicinal Chemistry",
issn = "1389-5575",
publisher = "Bentham Science Publishers B.V.",
number = "2",

}

TY - JOUR

T1 - Comparison of predictive ability of water solubility QSPR models generated by MLR, PLS and ANN methods

AU - Erös, Daniel

AU - Kéri, G.

AU - Kövesdi, István

AU - Szántai-Kis, Csaba

AU - Mészáros, György

AU - Őrfi, L.

PY - 2004/2

Y1 - 2004/2

N2 - ADME/Tox computational screening is one of the most hot topics of modern drug research. About one half of the potential drug candidates fail because of poor ADME/Tox properties. Since the experimental determination of water solubility is time-consuming also, reliable computational predictions are needed for the pre-selection of acceptable "drug-like" compounds from diverse combinatorial libraries. Recently many successful attempts were made for predicting water solubility of compounds. A comprehensive review of previously developed water solubility calculation methods is presented here, followed by the description of the solubility prediction method designed and used in our laboratory. We have selected carefully 1381 compounds from scientific publications in a unified database and used this dataset in the calculations. The externally validated models were based on calculated descriptors only. The aim of model optimization was to improve repeated evaluations statistics of the predictions and effective descriptor scoring functions were used to facilitate quick generation of multiple linear regression analysis (MLR), partial least squares method (PLS) and artificial neural network (ANN) models with optimal predicting ability. Standard error of prediction of the best model generated with ANN (with 39-7-1 network structure) was 0.72 in logS units while the cross validated squared correlation coefficient (Q2) was better than 0.85. These values give a good chance for successful pre-selection of screening compounds from virtual libraries, based on the predicted water solubility.

AB - ADME/Tox computational screening is one of the most hot topics of modern drug research. About one half of the potential drug candidates fail because of poor ADME/Tox properties. Since the experimental determination of water solubility is time-consuming also, reliable computational predictions are needed for the pre-selection of acceptable "drug-like" compounds from diverse combinatorial libraries. Recently many successful attempts were made for predicting water solubility of compounds. A comprehensive review of previously developed water solubility calculation methods is presented here, followed by the description of the solubility prediction method designed and used in our laboratory. We have selected carefully 1381 compounds from scientific publications in a unified database and used this dataset in the calculations. The externally validated models were based on calculated descriptors only. The aim of model optimization was to improve repeated evaluations statistics of the predictions and effective descriptor scoring functions were used to facilitate quick generation of multiple linear regression analysis (MLR), partial least squares method (PLS) and artificial neural network (ANN) models with optimal predicting ability. Standard error of prediction of the best model generated with ANN (with 39-7-1 network structure) was 0.72 in logS units while the cross validated squared correlation coefficient (Q2) was better than 0.85. These values give a good chance for successful pre-selection of screening compounds from virtual libraries, based on the predicted water solubility.

KW - ADME

KW - ANN

KW - External validation

KW - MLR

KW - PLS

KW - QSPR

KW - Virtual screening

KW - Water solubility

UR - http://www.scopus.com/inward/record.url?scp=17044455173&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=17044455173&partnerID=8YFLogxK

U2 - 10.2174/1389557043487466

DO - 10.2174/1389557043487466

M3 - Article

C2 - 14965289

AN - SCOPUS:17044455173

VL - 4

SP - 167

EP - 177

JO - Mini-Reviews in Medicinal Chemistry

JF - Mini-Reviews in Medicinal Chemistry

SN - 1389-5575

IS - 2

ER -