Improving textual medication extraction using combined conditional random fields and rule-based systems

D. Tikk, Illés Solt

Research output: Contribution to journalArticle

18 Citations (Scopus)

Abstract

Objective: In the i2b2 Medication Extraction Challenge, medication names together with details of their administration were to be extracted from medical discharge summaries. Design: The task of the challenge was decomposed into three pipelined components: named entity identification, context-aware filtering and relation extraction. For named entity identification, first a rule-based (RB) method that was used in our overall fifth place-ranked solution at the challenge was investigated. Second, a conditional random fields (CRF) approach is presented for named entity identification (NEI) developed after the completion of the challenge. The CRF models are trained on the 17 ground truth documents, the output of the rule-based NEI component on all documents, a larger but potentially inaccurate training dataset. For both NEI approaches their effect on relation extraction performance was investigated. The filtering and relation extraction components are both rule-based. Measurements: In addition to the official entry level evaluation of the challenge, entity level analysis is also provided. Results: On the test data an entry level F1-score of 80% was achieved for exact matching and 81% for inexact matching with the RB-NEI component. The CRF produces a significantly weaker result, but CRF outperforms the rule-based model with 81% exact and 82% inexact F1-score (p

Original languageEnglish
Pages (from-to)540-544
Number of pages5
JournalJournal of the American Medical Informatics Association
Volume17
Issue number5
DOIs
Publication statusPublished - Sep 2010

Fingerprint

Names
Datasets

ASJC Scopus subject areas

  • Health Informatics
  • Medicine(all)

Cite this

Improving textual medication extraction using combined conditional random fields and rule-based systems. / Tikk, D.; Solt, Illés.

In: Journal of the American Medical Informatics Association, Vol. 17, No. 5, 09.2010, p. 540-544.

Research output: Contribution to journalArticle

@article{4d8ff36c1ccb4e4e878227df1fa6bb84,
title = "Improving textual medication extraction using combined conditional random fields and rule-based systems",
abstract = "Objective: In the i2b2 Medication Extraction Challenge, medication names together with details of their administration were to be extracted from medical discharge summaries. Design: The task of the challenge was decomposed into three pipelined components: named entity identification, context-aware filtering and relation extraction. For named entity identification, first a rule-based (RB) method that was used in our overall fifth place-ranked solution at the challenge was investigated. Second, a conditional random fields (CRF) approach is presented for named entity identification (NEI) developed after the completion of the challenge. The CRF models are trained on the 17 ground truth documents, the output of the rule-based NEI component on all documents, a larger but potentially inaccurate training dataset. For both NEI approaches their effect on relation extraction performance was investigated. The filtering and relation extraction components are both rule-based. Measurements: In addition to the official entry level evaluation of the challenge, entity level analysis is also provided. Results: On the test data an entry level F1-score of 80{\%} was achieved for exact matching and 81{\%} for inexact matching with the RB-NEI component. The CRF produces a significantly weaker result, but CRF outperforms the rule-based model with 81{\%} exact and 82{\%} inexact F1-score (p",
author = "D. Tikk and Ill{\'e}s Solt",
year = "2010",
month = "9",
doi = "10.1136/jamia.2010.004119",
language = "English",
volume = "17",
pages = "540--544",
journal = "Journal of the American Medical Informatics Association : JAMIA",
issn = "1067-5027",
publisher = "Oxford University Press",
number = "5",

}

TY - JOUR

T1 - Improving textual medication extraction using combined conditional random fields and rule-based systems

AU - Tikk, D.

AU - Solt, Illés

PY - 2010/9

Y1 - 2010/9

N2 - Objective: In the i2b2 Medication Extraction Challenge, medication names together with details of their administration were to be extracted from medical discharge summaries. Design: The task of the challenge was decomposed into three pipelined components: named entity identification, context-aware filtering and relation extraction. For named entity identification, first a rule-based (RB) method that was used in our overall fifth place-ranked solution at the challenge was investigated. Second, a conditional random fields (CRF) approach is presented for named entity identification (NEI) developed after the completion of the challenge. The CRF models are trained on the 17 ground truth documents, the output of the rule-based NEI component on all documents, a larger but potentially inaccurate training dataset. For both NEI approaches their effect on relation extraction performance was investigated. The filtering and relation extraction components are both rule-based. Measurements: In addition to the official entry level evaluation of the challenge, entity level analysis is also provided. Results: On the test data an entry level F1-score of 80% was achieved for exact matching and 81% for inexact matching with the RB-NEI component. The CRF produces a significantly weaker result, but CRF outperforms the rule-based model with 81% exact and 82% inexact F1-score (p

AB - Objective: In the i2b2 Medication Extraction Challenge, medication names together with details of their administration were to be extracted from medical discharge summaries. Design: The task of the challenge was decomposed into three pipelined components: named entity identification, context-aware filtering and relation extraction. For named entity identification, first a rule-based (RB) method that was used in our overall fifth place-ranked solution at the challenge was investigated. Second, a conditional random fields (CRF) approach is presented for named entity identification (NEI) developed after the completion of the challenge. The CRF models are trained on the 17 ground truth documents, the output of the rule-based NEI component on all documents, a larger but potentially inaccurate training dataset. For both NEI approaches their effect on relation extraction performance was investigated. The filtering and relation extraction components are both rule-based. Measurements: In addition to the official entry level evaluation of the challenge, entity level analysis is also provided. Results: On the test data an entry level F1-score of 80% was achieved for exact matching and 81% for inexact matching with the RB-NEI component. The CRF produces a significantly weaker result, but CRF outperforms the rule-based model with 81% exact and 82% inexact F1-score (p

UR - http://www.scopus.com/inward/record.url?scp=78149481363&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78149481363&partnerID=8YFLogxK

U2 - 10.1136/jamia.2010.004119

DO - 10.1136/jamia.2010.004119

M3 - Article

C2 - 20819860

AN - SCOPUS:78149481363

VL - 17

SP - 540

EP - 544

JO - Journal of the American Medical Informatics Association : JAMIA

JF - Journal of the American Medical Informatics Association : JAMIA

SN - 1067-5027

IS - 5

ER -