How to avoid over-fitting in multivariate calibration-The conventional validation approach and an alternative

N. M. Faber, R. Rajkó

Research output: Contribution to journalArticle

125 Citations (Scopus)

Abstract

This paper critically reviews the problem of over-fitting in multivariate calibration and the conventional validation-based approach to avoid it. It proposes a randomization test that enables one to assess the statistical significance of each component that enters the model. This alternative is compared with cross-validation and independent test set validation for the calibration of a near-infrared spectral data set using partial least squares (PLS) regression. The results indicate that the alternative approach is more objective, since, unlike the validation-based approach, it does not require the use of 'soft' decision rules. The alternative approach therefore appears to be a useful addition to the chemometrician's toolbox.

Original languageEnglish
Pages (from-to)98-106
Number of pages9
JournalAnalytica Chimica Acta
Volume595
Issue number1-2 SPEC. ISS.
DOIs
Publication statusPublished - Jul 9 2007

Fingerprint

Calibration
calibration
Random Allocation
Least-Squares Analysis
Infrared radiation
near infrared
Datasets
test

Keywords

  • Component selection
  • Cross-validation
  • Multivariate calibration
  • Near-infrared spectroscopy
  • PLS
  • Randomization test
  • Test set validation

ASJC Scopus subject areas

  • Biochemistry
  • Analytical Chemistry
  • Spectroscopy
  • Environmental Chemistry

Cite this

How to avoid over-fitting in multivariate calibration-The conventional validation approach and an alternative. / Faber, N. M.; Rajkó, R.

In: Analytica Chimica Acta, Vol. 595, No. 1-2 SPEC. ISS., 09.07.2007, p. 98-106.

Research output: Contribution to journalArticle

@article{f89b418b3ede498991ac02600092d3ae,
title = "How to avoid over-fitting in multivariate calibration-The conventional validation approach and an alternative",
abstract = "This paper critically reviews the problem of over-fitting in multivariate calibration and the conventional validation-based approach to avoid it. It proposes a randomization test that enables one to assess the statistical significance of each component that enters the model. This alternative is compared with cross-validation and independent test set validation for the calibration of a near-infrared spectral data set using partial least squares (PLS) regression. The results indicate that the alternative approach is more objective, since, unlike the validation-based approach, it does not require the use of 'soft' decision rules. The alternative approach therefore appears to be a useful addition to the chemometrician's toolbox.",
keywords = "Component selection, Cross-validation, Multivariate calibration, Near-infrared spectroscopy, PLS, Randomization test, Test set validation",
author = "Faber, {N. M.} and R. Rajk{\'o}",
year = "2007",
month = "7",
day = "9",
doi = "10.1016/j.aca.2007.05.030",
language = "English",
volume = "595",
pages = "98--106",
journal = "Analytica Chimica Acta",
issn = "0003-2670",
publisher = "Elsevier",
number = "1-2 SPEC. ISS.",

}

TY - JOUR

T1 - How to avoid over-fitting in multivariate calibration-The conventional validation approach and an alternative

AU - Faber, N. M.

AU - Rajkó, R.

PY - 2007/7/9

Y1 - 2007/7/9

N2 - This paper critically reviews the problem of over-fitting in multivariate calibration and the conventional validation-based approach to avoid it. It proposes a randomization test that enables one to assess the statistical significance of each component that enters the model. This alternative is compared with cross-validation and independent test set validation for the calibration of a near-infrared spectral data set using partial least squares (PLS) regression. The results indicate that the alternative approach is more objective, since, unlike the validation-based approach, it does not require the use of 'soft' decision rules. The alternative approach therefore appears to be a useful addition to the chemometrician's toolbox.

AB - This paper critically reviews the problem of over-fitting in multivariate calibration and the conventional validation-based approach to avoid it. It proposes a randomization test that enables one to assess the statistical significance of each component that enters the model. This alternative is compared with cross-validation and independent test set validation for the calibration of a near-infrared spectral data set using partial least squares (PLS) regression. The results indicate that the alternative approach is more objective, since, unlike the validation-based approach, it does not require the use of 'soft' decision rules. The alternative approach therefore appears to be a useful addition to the chemometrician's toolbox.

KW - Component selection

KW - Cross-validation

KW - Multivariate calibration

KW - Near-infrared spectroscopy

KW - PLS

KW - Randomization test

KW - Test set validation

UR - http://www.scopus.com/inward/record.url?scp=34250813108&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=34250813108&partnerID=8YFLogxK

U2 - 10.1016/j.aca.2007.05.030

DO - 10.1016/j.aca.2007.05.030

M3 - Article

C2 - 17605988

AN - SCOPUS:34250813108

VL - 595

SP - 98

EP - 106

JO - Analytica Chimica Acta

JF - Analytica Chimica Acta

SN - 0003-2670

IS - 1-2 SPEC. ISS.

ER -