Life beyond the Tanimoto coefficient: Similarity measures for interaction fingerprints

Anita Rácz, Dávid Bajusz, K. Heberger

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Background: Interaction fingerprints (IFP) have been repeatedly shown to be valuable tools in virtual screening to identify novel hit compounds that can subsequently be optimized to drug candidates. As a complementary method to ligand docking, IFPs can be applied to quantify the similarity of predicted binding poses to a reference binding pose. For this purpose, a large number of similarity metrics can be applied, and various parameters of the IFPs themselves can be customized. In a large-scale comparison, we have assessed the effect of similarity metrics and IFP configurations to a number of virtual screening scenarios with ten different protein targets and thousands of molecules. Particularly, the effect of considering general interaction definitions (such as Any Contact, Backbone Interaction and Sidechain Interaction), the effect of filtering methods and the different groups of similarity metrics were studied. Results: The performances were primarily compared based on AUC values, but we have also used the original similarity data for the comparison of similarity metrics with several statistical tests and the novel, robust sum of ranking differences (SRD) algorithm. With SRD, we can evaluate the consistency (or concordance) of the various similarity metrics to an ideal reference metric, which is provided by data fusion from the existing metrics. Different aspects of IFP configurations and similarity metrics were examined based on SRD values with analysis of variance (ANOVA) tests. Conclusion: A general approach is provided that can be applied for the reliable interpretation and usage of similarity measures with interaction fingerprints. Metrics that are viable alternatives to the commonly used Tanimoto coefficient were identified based on a comparison with an ideal reference metric (consensus). A careful selection of the applied bits (interaction definitions) and IFP filtering rules can improve the results of virtual screening (in terms of their agreement with the consensus metric). The open-source Python package FPKit was introduced for the similarity calculations and IFP filtering; it is available at: https://github.com/davidbajusz/fpkit.[Figure not available: see fulltext.]

Original languageEnglish
Article number48
JournalJournal of Cheminformatics
Volume10
Issue number1
DOIs
Publication statusPublished - Oct 4 2018

Fingerprint

Screening
coefficients
interaction
Statistical tests
Data fusion
interactions
ranking
Analysis of variance (ANOVA)
Ligands
screening
Proteins
Molecules
configuration interaction
Pharmaceutical Preparations
analysis of variance
statistical tests
multisensor fusion
statistical test
Values
drugs

Keywords

  • ANOVA
  • Binary fingerprints
  • FPKit
  • Interaction fingerprint
  • Similarity metrics
  • SRD
  • Virtual screening

ASJC Scopus subject areas

  • Computer Science Applications
  • Physical and Theoretical Chemistry
  • Computer Graphics and Computer-Aided Design
  • Library and Information Sciences

Cite this

Life beyond the Tanimoto coefficient : Similarity measures for interaction fingerprints. / Rácz, Anita; Bajusz, Dávid; Heberger, K.

In: Journal of Cheminformatics, Vol. 10, No. 1, 48, 04.10.2018.

Research output: Contribution to journalArticle

@article{2499081c73ea4969a1f47888820e7431,
title = "Life beyond the Tanimoto coefficient: Similarity measures for interaction fingerprints",
abstract = "Background: Interaction fingerprints (IFP) have been repeatedly shown to be valuable tools in virtual screening to identify novel hit compounds that can subsequently be optimized to drug candidates. As a complementary method to ligand docking, IFPs can be applied to quantify the similarity of predicted binding poses to a reference binding pose. For this purpose, a large number of similarity metrics can be applied, and various parameters of the IFPs themselves can be customized. In a large-scale comparison, we have assessed the effect of similarity metrics and IFP configurations to a number of virtual screening scenarios with ten different protein targets and thousands of molecules. Particularly, the effect of considering general interaction definitions (such as Any Contact, Backbone Interaction and Sidechain Interaction), the effect of filtering methods and the different groups of similarity metrics were studied. Results: The performances were primarily compared based on AUC values, but we have also used the original similarity data for the comparison of similarity metrics with several statistical tests and the novel, robust sum of ranking differences (SRD) algorithm. With SRD, we can evaluate the consistency (or concordance) of the various similarity metrics to an ideal reference metric, which is provided by data fusion from the existing metrics. Different aspects of IFP configurations and similarity metrics were examined based on SRD values with analysis of variance (ANOVA) tests. Conclusion: A general approach is provided that can be applied for the reliable interpretation and usage of similarity measures with interaction fingerprints. Metrics that are viable alternatives to the commonly used Tanimoto coefficient were identified based on a comparison with an ideal reference metric (consensus). A careful selection of the applied bits (interaction definitions) and IFP filtering rules can improve the results of virtual screening (in terms of their agreement with the consensus metric). The open-source Python package FPKit was introduced for the similarity calculations and IFP filtering; it is available at: https://github.com/davidbajusz/fpkit.[Figure not available: see fulltext.]",
keywords = "ANOVA, Binary fingerprints, FPKit, Interaction fingerprint, Similarity metrics, SRD, Virtual screening",
author = "Anita R{\'a}cz and D{\'a}vid Bajusz and K. Heberger",
year = "2018",
month = "10",
day = "4",
doi = "10.1186/s13321-018-0302-y",
language = "English",
volume = "10",
journal = "Journal of Cheminformatics",
issn = "1758-2946",
publisher = "Chemistry Central",
number = "1",

}

TY - JOUR

T1 - Life beyond the Tanimoto coefficient

T2 - Similarity measures for interaction fingerprints

AU - Rácz, Anita

AU - Bajusz, Dávid

AU - Heberger, K.

PY - 2018/10/4

Y1 - 2018/10/4

N2 - Background: Interaction fingerprints (IFP) have been repeatedly shown to be valuable tools in virtual screening to identify novel hit compounds that can subsequently be optimized to drug candidates. As a complementary method to ligand docking, IFPs can be applied to quantify the similarity of predicted binding poses to a reference binding pose. For this purpose, a large number of similarity metrics can be applied, and various parameters of the IFPs themselves can be customized. In a large-scale comparison, we have assessed the effect of similarity metrics and IFP configurations to a number of virtual screening scenarios with ten different protein targets and thousands of molecules. Particularly, the effect of considering general interaction definitions (such as Any Contact, Backbone Interaction and Sidechain Interaction), the effect of filtering methods and the different groups of similarity metrics were studied. Results: The performances were primarily compared based on AUC values, but we have also used the original similarity data for the comparison of similarity metrics with several statistical tests and the novel, robust sum of ranking differences (SRD) algorithm. With SRD, we can evaluate the consistency (or concordance) of the various similarity metrics to an ideal reference metric, which is provided by data fusion from the existing metrics. Different aspects of IFP configurations and similarity metrics were examined based on SRD values with analysis of variance (ANOVA) tests. Conclusion: A general approach is provided that can be applied for the reliable interpretation and usage of similarity measures with interaction fingerprints. Metrics that are viable alternatives to the commonly used Tanimoto coefficient were identified based on a comparison with an ideal reference metric (consensus). A careful selection of the applied bits (interaction definitions) and IFP filtering rules can improve the results of virtual screening (in terms of their agreement with the consensus metric). The open-source Python package FPKit was introduced for the similarity calculations and IFP filtering; it is available at: https://github.com/davidbajusz/fpkit.[Figure not available: see fulltext.]

AB - Background: Interaction fingerprints (IFP) have been repeatedly shown to be valuable tools in virtual screening to identify novel hit compounds that can subsequently be optimized to drug candidates. As a complementary method to ligand docking, IFPs can be applied to quantify the similarity of predicted binding poses to a reference binding pose. For this purpose, a large number of similarity metrics can be applied, and various parameters of the IFPs themselves can be customized. In a large-scale comparison, we have assessed the effect of similarity metrics and IFP configurations to a number of virtual screening scenarios with ten different protein targets and thousands of molecules. Particularly, the effect of considering general interaction definitions (such as Any Contact, Backbone Interaction and Sidechain Interaction), the effect of filtering methods and the different groups of similarity metrics were studied. Results: The performances were primarily compared based on AUC values, but we have also used the original similarity data for the comparison of similarity metrics with several statistical tests and the novel, robust sum of ranking differences (SRD) algorithm. With SRD, we can evaluate the consistency (or concordance) of the various similarity metrics to an ideal reference metric, which is provided by data fusion from the existing metrics. Different aspects of IFP configurations and similarity metrics were examined based on SRD values with analysis of variance (ANOVA) tests. Conclusion: A general approach is provided that can be applied for the reliable interpretation and usage of similarity measures with interaction fingerprints. Metrics that are viable alternatives to the commonly used Tanimoto coefficient were identified based on a comparison with an ideal reference metric (consensus). A careful selection of the applied bits (interaction definitions) and IFP filtering rules can improve the results of virtual screening (in terms of their agreement with the consensus metric). The open-source Python package FPKit was introduced for the similarity calculations and IFP filtering; it is available at: https://github.com/davidbajusz/fpkit.[Figure not available: see fulltext.]

KW - ANOVA

KW - Binary fingerprints

KW - FPKit

KW - Interaction fingerprint

KW - Similarity metrics

KW - SRD

KW - Virtual screening

UR - http://www.scopus.com/inward/record.url?scp=85054537670&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054537670&partnerID=8YFLogxK

U2 - 10.1186/s13321-018-0302-y

DO - 10.1186/s13321-018-0302-y

M3 - Article

AN - SCOPUS:85054537670

VL - 10

JO - Journal of Cheminformatics

JF - Journal of Cheminformatics

SN - 1758-2946

IS - 1

M1 - 48

ER -