Estimation of influential points in any data set from coefficient of determination and its leave-one-out cross-validated counterpart

Gergely Tóth, Zsolt Bodai, K. Heberger

Research output: Contribution to journalArticle

14 Citations (Scopus)

Abstract

Coefficient of determination (R 2) and its leave-one-out cross-validated analogue (denoted by Q 2 or R cv 2) are the most frequantly published values to characterize the predictive performance of models. In this article we use R 2 and Q 2 in a reversed aspect to determine uncommon points, i.e. influential points in any data sets. The term (1 - Q 2)/(1 - R 2) corresponds to the ratio of predictive residual sum of squares and the residual sum of squares. The ratio correlates to the number of influential points in experimental and random data sets. We propose an (approximate) F test on (1 - Q 2)/(1 - R 2) term to quickly pre-estimate the presence of influential points in training sets of models. The test is founded upon the routinely calculated Q 2 and R 2 values and warns the model builders to verify the training set, to perform influence analysis or even to change to robust modeling. Graphical Abstract: [Figure not available: see fulltext.]

Original languageEnglish
Pages (from-to)837-844
Number of pages8
JournalJournal of Computer-Aided Molecular Design
Volume27
Issue number10
DOIs
Publication statusPublished - Oct 2013

Fingerprint

education
coefficients
analogs
estimates
Datasets

Keywords

  • Coefficient of determination
  • Influence analysis
  • Leave-one-out cross-validation
  • Prediction
  • Quantitative structure activity relationships
  • Training set

ASJC Scopus subject areas

  • Drug Discovery
  • Physical and Theoretical Chemistry
  • Computer Science Applications

Cite this

Estimation of influential points in any data set from coefficient of determination and its leave-one-out cross-validated counterpart. / Tóth, Gergely; Bodai, Zsolt; Heberger, K.

In: Journal of Computer-Aided Molecular Design, Vol. 27, No. 10, 10.2013, p. 837-844.

Research output: Contribution to journalArticle

@article{b025c0adcc2c4718b62ef39ec0ce8bd7,
title = "Estimation of influential points in any data set from coefficient of determination and its leave-one-out cross-validated counterpart",
abstract = "Coefficient of determination (R 2) and its leave-one-out cross-validated analogue (denoted by Q 2 or R cv 2) are the most frequantly published values to characterize the predictive performance of models. In this article we use R 2 and Q 2 in a reversed aspect to determine uncommon points, i.e. influential points in any data sets. The term (1 - Q 2)/(1 - R 2) corresponds to the ratio of predictive residual sum of squares and the residual sum of squares. The ratio correlates to the number of influential points in experimental and random data sets. We propose an (approximate) F test on (1 - Q 2)/(1 - R 2) term to quickly pre-estimate the presence of influential points in training sets of models. The test is founded upon the routinely calculated Q 2 and R 2 values and warns the model builders to verify the training set, to perform influence analysis or even to change to robust modeling. Graphical Abstract: [Figure not available: see fulltext.]",
keywords = "Coefficient of determination, Influence analysis, Leave-one-out cross-validation, Prediction, Quantitative structure activity relationships, Training set",
author = "Gergely T{\'o}th and Zsolt Bodai and K. Heberger",
year = "2013",
month = "10",
doi = "10.1007/s10822-013-9680-4",
language = "English",
volume = "27",
pages = "837--844",
journal = "Journal of Computer-Aided Molecular Design",
issn = "0920-654X",
publisher = "Springer Netherlands",
number = "10",

}

TY - JOUR

T1 - Estimation of influential points in any data set from coefficient of determination and its leave-one-out cross-validated counterpart

AU - Tóth, Gergely

AU - Bodai, Zsolt

AU - Heberger, K.

PY - 2013/10

Y1 - 2013/10

N2 - Coefficient of determination (R 2) and its leave-one-out cross-validated analogue (denoted by Q 2 or R cv 2) are the most frequantly published values to characterize the predictive performance of models. In this article we use R 2 and Q 2 in a reversed aspect to determine uncommon points, i.e. influential points in any data sets. The term (1 - Q 2)/(1 - R 2) corresponds to the ratio of predictive residual sum of squares and the residual sum of squares. The ratio correlates to the number of influential points in experimental and random data sets. We propose an (approximate) F test on (1 - Q 2)/(1 - R 2) term to quickly pre-estimate the presence of influential points in training sets of models. The test is founded upon the routinely calculated Q 2 and R 2 values and warns the model builders to verify the training set, to perform influence analysis or even to change to robust modeling. Graphical Abstract: [Figure not available: see fulltext.]

AB - Coefficient of determination (R 2) and its leave-one-out cross-validated analogue (denoted by Q 2 or R cv 2) are the most frequantly published values to characterize the predictive performance of models. In this article we use R 2 and Q 2 in a reversed aspect to determine uncommon points, i.e. influential points in any data sets. The term (1 - Q 2)/(1 - R 2) corresponds to the ratio of predictive residual sum of squares and the residual sum of squares. The ratio correlates to the number of influential points in experimental and random data sets. We propose an (approximate) F test on (1 - Q 2)/(1 - R 2) term to quickly pre-estimate the presence of influential points in training sets of models. The test is founded upon the routinely calculated Q 2 and R 2 values and warns the model builders to verify the training set, to perform influence analysis or even to change to robust modeling. Graphical Abstract: [Figure not available: see fulltext.]

KW - Coefficient of determination

KW - Influence analysis

KW - Leave-one-out cross-validation

KW - Prediction

KW - Quantitative structure activity relationships

KW - Training set

UR - http://www.scopus.com/inward/record.url?scp=84890554121&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84890554121&partnerID=8YFLogxK

U2 - 10.1007/s10822-013-9680-4

DO - 10.1007/s10822-013-9680-4

M3 - Article

C2 - 24141986

AN - SCOPUS:84890554121

VL - 27

SP - 837

EP - 844

JO - Journal of Computer-Aided Molecular Design

JF - Journal of Computer-Aided Molecular Design

SN - 0920-654X

IS - 10

ER -