A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature

D. Tikk, Philippe Thomas, Peter Palaga, Jörg Hakenberg, Ulf Leser

Research output: Contribution to journalArticle

84 Citations (Scopus)

Abstract

The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein- protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using crossvalidation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods.

Original languageEnglish
Article numbere1000837
Pages (from-to)32
Number of pages1
JournalPLoS Computational Biology
Volume6
Issue number7
DOIs
Publication statusPublished - Jul 2010

Fingerprint

Benchmarking
Kernel Methods
protein-protein interactions
Protein-protein Interaction
Benchmark
Proteins
protein
kernel
extracts
seeds
Evaluation
methodology
Convolution
learning
Publications
Life sciences
Text Mining
Information Extraction
Parameter Optimization
Parsing

ASJC Scopus subject areas

  • Cellular and Molecular Neuroscience
  • Ecology
  • Molecular Biology
  • Genetics
  • Ecology, Evolution, Behavior and Systematics
  • Modelling and Simulation
  • Computational Theory and Mathematics

Cite this

A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature. / Tikk, D.; Thomas, Philippe; Palaga, Peter; Hakenberg, Jörg; Leser, Ulf.

In: PLoS Computational Biology, Vol. 6, No. 7, e1000837, 07.2010, p. 32.

Research output: Contribution to journalArticle

Tikk, D. ; Thomas, Philippe ; Palaga, Peter ; Hakenberg, Jörg ; Leser, Ulf. / A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature. In: PLoS Computational Biology. 2010 ; Vol. 6, No. 7. pp. 32.
@article{b591456f9ab049c4a65fb9680a7c9290,
title = "A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature",
abstract = "The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein- protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using crossvalidation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods.",
author = "D. Tikk and Philippe Thomas and Peter Palaga and J{\"o}rg Hakenberg and Ulf Leser",
year = "2010",
month = "7",
doi = "10.1371/journal.pcbi.1000837",
language = "English",
volume = "6",
pages = "32",
journal = "PLoS Computational Biology",
issn = "1553-734X",
publisher = "Public Library of Science",
number = "7",

}

TY - JOUR

T1 - A comprehensive benchmark of kernel methods to extract protein-protein interactions from literature

AU - Tikk, D.

AU - Thomas, Philippe

AU - Palaga, Peter

AU - Hakenberg, Jörg

AU - Leser, Ulf

PY - 2010/7

Y1 - 2010/7

N2 - The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein- protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using crossvalidation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods.

AB - The most important way of conveying new findings in biomedical research is scientific publication. Extraction of protein- protein interactions (PPIs) reported in scientific publications is one of the core topics of text mining in the life sciences. Recently, a new class of such methods has been proposed - convolution kernels that identify PPIs using deep parses of sentences. However, comparing published results of different PPI extraction methods is impossible due to the use of different evaluation corpora, different evaluation metrics, different tuning procedures, etc. In this paper, we study whether the reported performance metrics are robust across different corpora and learning settings and whether the use of deep parsing actually leads to an increase in extraction quality. Our ultimate goal is to identify the one method that performs best in real-life scenarios, where information extraction is performed on unseen text and not on specifically prepared evaluation data. We performed a comprehensive benchmarking of nine different methods for PPI extraction that use convolution kernels on rich linguistic information. Methods were evaluated on five different public corpora using crossvalidation, cross-learning, and cross-corpus evaluation. Our study confirms that kernels using dependency trees generally outperform kernels based on syntax trees. However, our study also shows that only the best kernel methods can compete with a simple rule-based approach when the evaluation prevents information leakage between training and test corpora. Our results further reveal that the F-score of many approaches drops significantly if no corpus-specific parameter optimization is applied and that methods reaching a good AUC score often perform much worse in terms of F-score. We conclude that for most kernels no sensible estimation of PPI extraction performance on new text is possible, given the current heterogeneity in evaluation data. Nevertheless, our study shows that three kernels are clearly superior to the other methods.

UR - http://www.scopus.com/inward/record.url?scp=78049245506&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78049245506&partnerID=8YFLogxK

U2 - 10.1371/journal.pcbi.1000837

DO - 10.1371/journal.pcbi.1000837

M3 - Article

VL - 6

SP - 32

JO - PLoS Computational Biology

JF - PLoS Computational Biology

SN - 1553-734X

IS - 7

M1 - e1000837

ER -