Wikifying novel words to mixtures of Wikipedia senses by structured sparse coding

BaláZs Pintér, Gyula Vörös, ZoltáN Szabó, AndráS Lőrincz

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

We extend the scope of Wikification to novel words by relaxing two premises of Wikification: (i) we wikify without using the surface form of the word (ii) to a mixture of Wikipedia senses instead of a single sense. We identify two types of “novel” words: words where the connection between their surface form and their meaning is broken (e.g., a misspelled word), and words where there is no meaning to connect to—the meaning itself is also novel. We propose a method capable of wikifying both types of novel words while also dealing with the inherently large-scale disambiguation problem. We show that the method can disambiguate between up to 1,000 Wikipedia senses, and it can explain words with novel meaning as a mixture of other, possibly related senses. This mixture representation compares favorably to the widely used bag of words representation.

Original languageEnglish
Title of host publicationPattern Recognition Applications and Methods - International Conference, ICPRAM 2013, Revised Selected Papers
EditorsMaria De Marsico, Ana Fred
PublisherSpringer Verlag
Pages241-255
Number of pages15
ISBN (Electronic)9783319126098
DOIs
Publication statusPublished - Jan 1 2015
Event2nd International Conference on Pattern Recognition, ICPRAM 2013 - Barcelona, Spain
Duration: Feb 15 2013Feb 18 2013

Publication series

NameAdvances in Intelligent Systems and Computing
Volume318
ISSN (Print)2194-5357

Other

Other2nd International Conference on Pattern Recognition, ICPRAM 2013
CountrySpain
CityBarcelona
Period2/15/132/18/13

Keywords

  • Interpreting novel words
  • Link disambiguation
  • Natural language processing
  • Structured sparse coding
  • Wikification

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Computer Science(all)

Cite this

Pintér, B., Vörös, G., Szabó, Z., & Lőrincz, A. (2015). Wikifying novel words to mixtures of Wikipedia senses by structured sparse coding. In M. De Marsico, & A. Fred (Eds.), Pattern Recognition Applications and Methods - International Conference, ICPRAM 2013, Revised Selected Papers (pp. 241-255). (Advances in Intelligent Systems and Computing; Vol. 318). Springer Verlag. https://doi.org/10.1007/978-3-319-12610-4_15