Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics

Anita Rácz, Dávid Bajusz, K. Heberger

Research output: Contribution to journalArticle

Abstract

Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. the prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well. The quality of classification models can be determined with several performance parameters. which often give conflicting results. In this study, we performed a multi-level comparison with the use of different performance metrics and machine learning classification methods. Well-established and standardized protocols for the machine learning tasks were used in each case. The comparison was applied to three datasets (acute and aquatic toxicities) and the robust, yet sensitive, sum of ranking differences (SRD) and analysis of variance (ANOVA) were applied for evaluation. The effect of dataset composition (balanced vs. imbalanced) and 2-class vs. multiclass classification scenarios was also studied. Most of the performance metrics are sensitive to dataset composition, especially in 2-class classification problems. The optimal machine learning algorithm also depends significantly on the composition of the dataset.

Original languageEnglish
JournalMolecules (Basel, Switzerland)
Volume24
Issue number15
DOIs
Publication statusPublished - Aug 1 2019

Fingerprint

machine learning
classifiers
Learning systems
Classifiers
toxicity
Toxicity
Chemical analysis
analysis of variance
Molecules
ranking
Poisons
Analysis of variance (ANOVA)
activity (biology)
predictions
Bioactivity
Learning algorithms
learning
animals
Machine Learning
molecules

Keywords

  • ANOVA
  • classifiers
  • machine learning
  • performance metrics
  • ranking
  • ROC
  • toxicity prediction

ASJC Scopus subject areas

  • Analytical Chemistry
  • Chemistry (miscellaneous)
  • Molecular Medicine
  • Pharmaceutical Science
  • Drug Discovery
  • Physical and Theoretical Chemistry
  • Organic Chemistry

Cite this

Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics. / Rácz, Anita; Bajusz, Dávid; Heberger, K.

In: Molecules (Basel, Switzerland), Vol. 24, No. 15, 01.08.2019.

Research output: Contribution to journalArticle

@article{6d861ae7eeb042878c65fe591d456a48,
title = "Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics",
abstract = "Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. the prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well. The quality of classification models can be determined with several performance parameters. which often give conflicting results. In this study, we performed a multi-level comparison with the use of different performance metrics and machine learning classification methods. Well-established and standardized protocols for the machine learning tasks were used in each case. The comparison was applied to three datasets (acute and aquatic toxicities) and the robust, yet sensitive, sum of ranking differences (SRD) and analysis of variance (ANOVA) were applied for evaluation. The effect of dataset composition (balanced vs. imbalanced) and 2-class vs. multiclass classification scenarios was also studied. Most of the performance metrics are sensitive to dataset composition, especially in 2-class classification problems. The optimal machine learning algorithm also depends significantly on the composition of the dataset.",
keywords = "ANOVA, classifiers, machine learning, performance metrics, ranking, ROC, toxicity prediction",
author = "Anita R{\'a}cz and D{\'a}vid Bajusz and K. Heberger",
year = "2019",
month = "8",
day = "1",
doi = "10.3390/molecules24152811",
language = "English",
volume = "24",
journal = "Molecules",
issn = "1420-3049",
publisher = "Multidisciplinary Digital Publishing Institute (MDPI)",
number = "15",

}

TY - JOUR

T1 - Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics

AU - Rácz, Anita

AU - Bajusz, Dávid

AU - Heberger, K.

PY - 2019/8/1

Y1 - 2019/8/1

N2 - Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. the prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well. The quality of classification models can be determined with several performance parameters. which often give conflicting results. In this study, we performed a multi-level comparison with the use of different performance metrics and machine learning classification methods. Well-established and standardized protocols for the machine learning tasks were used in each case. The comparison was applied to three datasets (acute and aquatic toxicities) and the robust, yet sensitive, sum of ranking differences (SRD) and analysis of variance (ANOVA) were applied for evaluation. The effect of dataset composition (balanced vs. imbalanced) and 2-class vs. multiclass classification scenarios was also studied. Most of the performance metrics are sensitive to dataset composition, especially in 2-class classification problems. The optimal machine learning algorithm also depends significantly on the composition of the dataset.

AB - Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. the prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well. The quality of classification models can be determined with several performance parameters. which often give conflicting results. In this study, we performed a multi-level comparison with the use of different performance metrics and machine learning classification methods. Well-established and standardized protocols for the machine learning tasks were used in each case. The comparison was applied to three datasets (acute and aquatic toxicities) and the robust, yet sensitive, sum of ranking differences (SRD) and analysis of variance (ANOVA) were applied for evaluation. The effect of dataset composition (balanced vs. imbalanced) and 2-class vs. multiclass classification scenarios was also studied. Most of the performance metrics are sensitive to dataset composition, especially in 2-class classification problems. The optimal machine learning algorithm also depends significantly on the composition of the dataset.

KW - ANOVA

KW - classifiers

KW - machine learning

KW - performance metrics

KW - ranking

KW - ROC

KW - toxicity prediction

UR - http://www.scopus.com/inward/record.url?scp=85071186783&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85071186783&partnerID=8YFLogxK

U2 - 10.3390/molecules24152811

DO - 10.3390/molecules24152811

M3 - Article

VL - 24

JO - Molecules

JF - Molecules

SN - 1420-3049

IS - 15

ER -