Intercorrelation Limits in Molecular Descriptor Preselection for QSAR/QSPR

Anita Rácz, Dávid Bajusz, K. Heberger

Research output: Contribution to journalArticle

Abstract

QSAR/QSPR (quantitative structure-activity/property relationship) modeling has been a prevalent approach in various, overlapping sub-fields of computational, medicinal and environmental chemistry for decades. The generation and selection of molecular descriptors is an essential part of this process. In typical QSAR workflows, the starting pool of molecular descriptors is rationalized based on filtering out descriptors which are (i) constant throughout the whole dataset, or (ii) very strongly correlated to another descriptor. While the former is fairly straightforward, the latter involves a level of subjectivity when deciding what exactly is considered to be a strong correlation. Despite that, most QSAR modeling studies do not report on this step. In this study, we examine in detail the effect of various possible descriptor intercorrelation limits on the resulting QSAR models. Statistical comparisons are carried out based on four case studies from contemporary QSAR literature, using a combined methodology based on sum of ranking differences (SRD) and analysis of variance (ANOVA).

Original languageEnglish
JournalMolecular Informatics
DOIs
Publication statusPublished - Jan 1 2019

Fingerprint

Quantitative Structure-Activity Relationship
Analysis of variance (ANOVA)
Pharmaceutical Chemistry
Workflow
Analysis of Variance

Keywords

  • analysis of variance
  • correlation
  • descriptor
  • QSAR
  • regression
  • sum of ranking differences

ASJC Scopus subject areas

  • Structural Biology
  • Molecular Medicine
  • Drug Discovery
  • Computer Science Applications
  • Organic Chemistry

Cite this

Intercorrelation Limits in Molecular Descriptor Preselection for QSAR/QSPR. / Rácz, Anita; Bajusz, Dávid; Heberger, K.

In: Molecular Informatics, 01.01.2019.

Research output: Contribution to journalArticle

@article{7f1242f4850c434ebad55dfc92167dea,
title = "Intercorrelation Limits in Molecular Descriptor Preselection for QSAR/QSPR",
abstract = "QSAR/QSPR (quantitative structure-activity/property relationship) modeling has been a prevalent approach in various, overlapping sub-fields of computational, medicinal and environmental chemistry for decades. The generation and selection of molecular descriptors is an essential part of this process. In typical QSAR workflows, the starting pool of molecular descriptors is rationalized based on filtering out descriptors which are (i) constant throughout the whole dataset, or (ii) very strongly correlated to another descriptor. While the former is fairly straightforward, the latter involves a level of subjectivity when deciding what exactly is considered to be a strong correlation. Despite that, most QSAR modeling studies do not report on this step. In this study, we examine in detail the effect of various possible descriptor intercorrelation limits on the resulting QSAR models. Statistical comparisons are carried out based on four case studies from contemporary QSAR literature, using a combined methodology based on sum of ranking differences (SRD) and analysis of variance (ANOVA).",
keywords = "analysis of variance, correlation, descriptor, QSAR, regression, sum of ranking differences",
author = "Anita R{\'a}cz and D{\'a}vid Bajusz and K. Heberger",
year = "2019",
month = "1",
day = "1",
doi = "10.1002/minf.201800154",
language = "English",
journal = "Molecular Informatics",
issn = "1868-1743",
publisher = "Wiley - VCH Verlag GmbH & CO. KGaA",

}

TY - JOUR

T1 - Intercorrelation Limits in Molecular Descriptor Preselection for QSAR/QSPR

AU - Rácz, Anita

AU - Bajusz, Dávid

AU - Heberger, K.

PY - 2019/1/1

Y1 - 2019/1/1

N2 - QSAR/QSPR (quantitative structure-activity/property relationship) modeling has been a prevalent approach in various, overlapping sub-fields of computational, medicinal and environmental chemistry for decades. The generation and selection of molecular descriptors is an essential part of this process. In typical QSAR workflows, the starting pool of molecular descriptors is rationalized based on filtering out descriptors which are (i) constant throughout the whole dataset, or (ii) very strongly correlated to another descriptor. While the former is fairly straightforward, the latter involves a level of subjectivity when deciding what exactly is considered to be a strong correlation. Despite that, most QSAR modeling studies do not report on this step. In this study, we examine in detail the effect of various possible descriptor intercorrelation limits on the resulting QSAR models. Statistical comparisons are carried out based on four case studies from contemporary QSAR literature, using a combined methodology based on sum of ranking differences (SRD) and analysis of variance (ANOVA).

AB - QSAR/QSPR (quantitative structure-activity/property relationship) modeling has been a prevalent approach in various, overlapping sub-fields of computational, medicinal and environmental chemistry for decades. The generation and selection of molecular descriptors is an essential part of this process. In typical QSAR workflows, the starting pool of molecular descriptors is rationalized based on filtering out descriptors which are (i) constant throughout the whole dataset, or (ii) very strongly correlated to another descriptor. While the former is fairly straightforward, the latter involves a level of subjectivity when deciding what exactly is considered to be a strong correlation. Despite that, most QSAR modeling studies do not report on this step. In this study, we examine in detail the effect of various possible descriptor intercorrelation limits on the resulting QSAR models. Statistical comparisons are carried out based on four case studies from contemporary QSAR literature, using a combined methodology based on sum of ranking differences (SRD) and analysis of variance (ANOVA).

KW - analysis of variance

KW - correlation

KW - descriptor

KW - QSAR

KW - regression

KW - sum of ranking differences

UR - http://www.scopus.com/inward/record.url?scp=85063993721&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85063993721&partnerID=8YFLogxK

U2 - 10.1002/minf.201800154

DO - 10.1002/minf.201800154

M3 - Article

JO - Molecular Informatics

JF - Molecular Informatics

SN - 1868-1743

ER -