Fuzzy pseudo-thesaurus based clustering of a folkloristic corpus

S. Szaszkó, L. T. Kóczy, T. D. Gedeon

Research output: Contribution to journalConference article

Abstract

Automatic thesaurus extraction is essential for modern information retrieval. We develop a method for fuzzy pseudo-thesaurus based on word pair co-occurrence in documents. In this study it is presented, that considering the Word Frequency Degree counted on the whole corpus makes the obtained pseudo-thesaurus usable. Such parameters were found with which most of the obtained pairs of words were validated to be related by human expert. Among the extracted pairs and groups of words the relationship is often looser than synonymy, but they identify the frequently repeated topics of the corpus. We suggest the use of groups of closely related words for the definition of different topics and based on this clustering of the documents were performed.1

Original languageEnglish
Pages (from-to)126-131
Number of pages6
JournalIEEE International Conference on Fuzzy Systems
Publication statusPublished - Sep 1 2005
EventIEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2005 - Reno, NV, United States
Duration: May 22 2005May 25 2005

Keywords

  • Fuzzy information retrieval
  • Fuzzy thesaurus

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Artificial Intelligence
  • Applied Mathematics

Fingerprint Dive into the research topics of 'Fuzzy pseudo-thesaurus based clustering of a folkloristic corpus'. Together they form a unique fingerprint.

  • Cite this