Exploring the use of fuzzy signature for text mining

Kok Wai Wong, Todsanai Chumwatana, D. Tikk

Research output: Chapter in Book/Report/Conference proceedingConference contribution

2 Citations (Scopus)

Abstract

The classical approaches for the traditional problems of text mining, such as document indexing, document clustering or text classification, represent the text as bag-of-words. Words, the units of the representation, are determined by tokenization, using e.g. whitespace and punctuation characters as separator. The bag-of-word based methods face problem with non-segmented text typical for some Asian languages, since the tokenization based solution cannot be applied anymore to determine the representation units. Several solutions were proposed so far, among them frequent max substring mining is adopted here because of its language-independency and favourable speed and store requirements. We present in this paper a fuzzy signature based solution using frequent max substring for non-segmented document representation, and propose how it could be applied for some typical text mining tasks. We show how the flexibility of fuzzy signatures can be exploited for text mining tasks. With the use of this proposed concept, complex decision models in text mining may be constructed more effectively in future.

Original languageEnglish
Title of host publication2010 IEEE World Congress on Computational Intelligence, WCCI 2010
DOIs
Publication statusPublished - 2010
Event2010 6th IEEE World Congress on Computational Intelligence, WCCI 2010 - Barcelona, Spain
Duration: Jul 18 2010Jul 23 2010

Other

Other2010 6th IEEE World Congress on Computational Intelligence, WCCI 2010
CountrySpain
CityBarcelona
Period7/18/107/23/10

Fingerprint

Separators

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computational Theory and Mathematics

Cite this

Wai Wong, K., Chumwatana, T., & Tikk, D. (2010). Exploring the use of fuzzy signature for text mining. In 2010 IEEE World Congress on Computational Intelligence, WCCI 2010 [5584873] https://doi.org/10.1109/FUZZY.2010.5584873

Exploring the use of fuzzy signature for text mining. / Wai Wong, Kok; Chumwatana, Todsanai; Tikk, D.

2010 IEEE World Congress on Computational Intelligence, WCCI 2010. 2010. 5584873.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Wai Wong, K, Chumwatana, T & Tikk, D 2010, Exploring the use of fuzzy signature for text mining. in 2010 IEEE World Congress on Computational Intelligence, WCCI 2010., 5584873, 2010 6th IEEE World Congress on Computational Intelligence, WCCI 2010, Barcelona, Spain, 7/18/10. https://doi.org/10.1109/FUZZY.2010.5584873
Wai Wong K, Chumwatana T, Tikk D. Exploring the use of fuzzy signature for text mining. In 2010 IEEE World Congress on Computational Intelligence, WCCI 2010. 2010. 5584873 https://doi.org/10.1109/FUZZY.2010.5584873
Wai Wong, Kok ; Chumwatana, Todsanai ; Tikk, D. / Exploring the use of fuzzy signature for text mining. 2010 IEEE World Congress on Computational Intelligence, WCCI 2010. 2010.
@inproceedings{416e6f1da50f4769a5780ccea3767fb3,
title = "Exploring the use of fuzzy signature for text mining",
abstract = "The classical approaches for the traditional problems of text mining, such as document indexing, document clustering or text classification, represent the text as bag-of-words. Words, the units of the representation, are determined by tokenization, using e.g. whitespace and punctuation characters as separator. The bag-of-word based methods face problem with non-segmented text typical for some Asian languages, since the tokenization based solution cannot be applied anymore to determine the representation units. Several solutions were proposed so far, among them frequent max substring mining is adopted here because of its language-independency and favourable speed and store requirements. We present in this paper a fuzzy signature based solution using frequent max substring for non-segmented document representation, and propose how it could be applied for some typical text mining tasks. We show how the flexibility of fuzzy signatures can be exploited for text mining tasks. With the use of this proposed concept, complex decision models in text mining may be constructed more effectively in future.",
author = "{Wai Wong}, Kok and Todsanai Chumwatana and D. Tikk",
year = "2010",
doi = "10.1109/FUZZY.2010.5584873",
language = "English",
isbn = "9781424469208",
booktitle = "2010 IEEE World Congress on Computational Intelligence, WCCI 2010",

}

TY - GEN

T1 - Exploring the use of fuzzy signature for text mining

AU - Wai Wong, Kok

AU - Chumwatana, Todsanai

AU - Tikk, D.

PY - 2010

Y1 - 2010

N2 - The classical approaches for the traditional problems of text mining, such as document indexing, document clustering or text classification, represent the text as bag-of-words. Words, the units of the representation, are determined by tokenization, using e.g. whitespace and punctuation characters as separator. The bag-of-word based methods face problem with non-segmented text typical for some Asian languages, since the tokenization based solution cannot be applied anymore to determine the representation units. Several solutions were proposed so far, among them frequent max substring mining is adopted here because of its language-independency and favourable speed and store requirements. We present in this paper a fuzzy signature based solution using frequent max substring for non-segmented document representation, and propose how it could be applied for some typical text mining tasks. We show how the flexibility of fuzzy signatures can be exploited for text mining tasks. With the use of this proposed concept, complex decision models in text mining may be constructed more effectively in future.

AB - The classical approaches for the traditional problems of text mining, such as document indexing, document clustering or text classification, represent the text as bag-of-words. Words, the units of the representation, are determined by tokenization, using e.g. whitespace and punctuation characters as separator. The bag-of-word based methods face problem with non-segmented text typical for some Asian languages, since the tokenization based solution cannot be applied anymore to determine the representation units. Several solutions were proposed so far, among them frequent max substring mining is adopted here because of its language-independency and favourable speed and store requirements. We present in this paper a fuzzy signature based solution using frequent max substring for non-segmented document representation, and propose how it could be applied for some typical text mining tasks. We show how the flexibility of fuzzy signatures can be exploited for text mining tasks. With the use of this proposed concept, complex decision models in text mining may be constructed more effectively in future.

UR - http://www.scopus.com/inward/record.url?scp=78549296734&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78549296734&partnerID=8YFLogxK

U2 - 10.1109/FUZZY.2010.5584873

DO - 10.1109/FUZZY.2010.5584873

M3 - Conference contribution

SN - 9781424469208

BT - 2010 IEEE World Congress on Computational Intelligence, WCCI 2010

ER -