Fuzzy pseudo-thesaurus based clustering of a folkloristic corpus

S. Szaszkó, L. Kóczy, T. D. Gedeon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Automatic thesaurus extraction is essential for modern information retrieval. We develop a method for fuzzy pseudo-thesaurus based on word pair co-occurrence in documents. In this study it is presented, that considering the Word Frequency Degree counted on the whole corpus makes the obtained pseudo-thesaurus usable. Such parameters were found with which most of the obtained pairs of words were validated to be related by human expert. Among the extracted pairs and groups of words the relationship is often looser than synonymy, but they identify the frequently repeated topics of the corpus. We suggest the use of groups of closely related words for the definition of different topics and based on this clustering of the documents were performed.1

Original languageEnglish
Title of host publicationIEEE International Conference on Fuzzy Systems
EditorsR. Krishnapuram, N. Pal
Pages126-131
Number of pages6
Publication statusPublished - 2005
EventIEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2005 - Reno, NV, United States
Duration: May 22 2005May 25 2005

Other

OtherIEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2005
CountryUnited States
CityReno, NV
Period5/22/055/25/05

Fingerprint

Thesauri
Information retrieval

Keywords

  • Fuzzy information retrieval
  • Fuzzy thesaurus

ASJC Scopus subject areas

  • Software
  • Safety, Risk, Reliability and Quality
  • Chemical Health and Safety

Cite this

Szaszkó, S., Kóczy, L., & Gedeon, T. D. (2005). Fuzzy pseudo-thesaurus based clustering of a folkloristic corpus. In R. Krishnapuram, & N. Pal (Eds.), IEEE International Conference on Fuzzy Systems (pp. 126-131)

Fuzzy pseudo-thesaurus based clustering of a folkloristic corpus. / Szaszkó, S.; Kóczy, L.; Gedeon, T. D.

IEEE International Conference on Fuzzy Systems. ed. / R. Krishnapuram; N. Pal. 2005. p. 126-131.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Szaszkó, S, Kóczy, L & Gedeon, TD 2005, Fuzzy pseudo-thesaurus based clustering of a folkloristic corpus. in R Krishnapuram & N Pal (eds), IEEE International Conference on Fuzzy Systems. pp. 126-131, IEEE International Conference on Fuzzy Systems, FUZZ-IEEE 2005, Reno, NV, United States, 5/22/05.
Szaszkó S, Kóczy L, Gedeon TD. Fuzzy pseudo-thesaurus based clustering of a folkloristic corpus. In Krishnapuram R, Pal N, editors, IEEE International Conference on Fuzzy Systems. 2005. p. 126-131
Szaszkó, S. ; Kóczy, L. ; Gedeon, T. D. / Fuzzy pseudo-thesaurus based clustering of a folkloristic corpus. IEEE International Conference on Fuzzy Systems. editor / R. Krishnapuram ; N. Pal. 2005. pp. 126-131
@inproceedings{0c5066c433ff495badbbd85fa964463d,
title = "Fuzzy pseudo-thesaurus based clustering of a folkloristic corpus",
abstract = "Automatic thesaurus extraction is essential for modern information retrieval. We develop a method for fuzzy pseudo-thesaurus based on word pair co-occurrence in documents. In this study it is presented, that considering the Word Frequency Degree counted on the whole corpus makes the obtained pseudo-thesaurus usable. Such parameters were found with which most of the obtained pairs of words were validated to be related by human expert. Among the extracted pairs and groups of words the relationship is often looser than synonymy, but they identify the frequently repeated topics of the corpus. We suggest the use of groups of closely related words for the definition of different topics and based on this clustering of the documents were performed.1",
keywords = "Fuzzy information retrieval, Fuzzy thesaurus",
author = "S. Szaszk{\'o} and L. K{\'o}czy and Gedeon, {T. D.}",
year = "2005",
language = "English",
pages = "126--131",
editor = "R. Krishnapuram and N. Pal",
booktitle = "IEEE International Conference on Fuzzy Systems",

}

TY - GEN

T1 - Fuzzy pseudo-thesaurus based clustering of a folkloristic corpus

AU - Szaszkó, S.

AU - Kóczy, L.

AU - Gedeon, T. D.

PY - 2005

Y1 - 2005

N2 - Automatic thesaurus extraction is essential for modern information retrieval. We develop a method for fuzzy pseudo-thesaurus based on word pair co-occurrence in documents. In this study it is presented, that considering the Word Frequency Degree counted on the whole corpus makes the obtained pseudo-thesaurus usable. Such parameters were found with which most of the obtained pairs of words were validated to be related by human expert. Among the extracted pairs and groups of words the relationship is often looser than synonymy, but they identify the frequently repeated topics of the corpus. We suggest the use of groups of closely related words for the definition of different topics and based on this clustering of the documents were performed.1

AB - Automatic thesaurus extraction is essential for modern information retrieval. We develop a method for fuzzy pseudo-thesaurus based on word pair co-occurrence in documents. In this study it is presented, that considering the Word Frequency Degree counted on the whole corpus makes the obtained pseudo-thesaurus usable. Such parameters were found with which most of the obtained pairs of words were validated to be related by human expert. Among the extracted pairs and groups of words the relationship is often looser than synonymy, but they identify the frequently repeated topics of the corpus. We suggest the use of groups of closely related words for the definition of different topics and based on this clustering of the documents were performed.1

KW - Fuzzy information retrieval

KW - Fuzzy thesaurus

UR - http://www.scopus.com/inward/record.url?scp=23944510496&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=23944510496&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:23944510496

SP - 126

EP - 131

BT - IEEE International Conference on Fuzzy Systems

A2 - Krishnapuram, R.

A2 - Pal, N.

ER -