Co-clustering approaches to integrate lexical and bibliographical information

Frizo Janssens, Patrick Glenisson, Wolfgang Glänzel, Bart De Moor

Research output: Chapter in Book/Report/Conference proceedingConference contribution

9 Citations (Scopus)

Abstract

Terms are the building blocks to organize and access information, and hold a key position in information retrieval. In forthcoming work we have shown how a methodology of indexing full-text scientific articles combined with an exploratory statistical analysis can improve on bibliometric approaches to mapping science. Textual documents are indexed and further characterized using data mining techniques and co-word analysis. We start this paper by briefly demonstrating the text mining approach. Whereas statistical processing based on full-text documents provides a relational view based on the topicality represented by these documents, bibliometric components can include other characteristics that describe their position in the set. Therefore we extend on previous work and explore how hybrid methodologies that deeply combine text analysis and bibliometric methods can improve the mapping of science and technology. In particular, we propose a method to mathematically combine document similarity matrices resulting from vector-based indices on the one hand, and from selected bibliometric indicators on the other hand. Weighted linear combinations as well as approaches inspired on statistical meta-analysis are presented. Both pitfalls and possible solutions are discussed. The resulting combined similarity matrix offers an attractive way to 'co-cluster' documents based on both lexical and bibliographic information.

Original languageEnglish
Title of host publicationProceedings of ISSI 2005
Subtitle of host publication10th International Conference of the International Society for Scientometrics and Informetrics
Pages284-289
Number of pages6
Publication statusPublished - Dec 1 2005
Event10th Biennial International Conference of the International Society for Scientometrics and Informetrics, ISSI 2005 - Stockholm, Sweden
Duration: Jul 24 2005Jul 28 2005

Publication series

NameProceedings of ISSI 2005: 10th International Conference of the International Society for Scientometrics and Informetrics
Volume1

Other

Other10th Biennial International Conference of the International Society for Scientometrics and Informetrics, ISSI 2005
CountrySweden
CityStockholm
Period7/24/057/28/05

    Fingerprint

ASJC Scopus subject areas

  • Computer Science Applications
  • Management Science and Operations Research
  • Applied Mathematics
  • Modelling and Simulation
  • Statistics and Probability

Cite this

Janssens, F., Glenisson, P., Glänzel, W., & De Moor, B. (2005). Co-clustering approaches to integrate lexical and bibliographical information. In Proceedings of ISSI 2005: 10th International Conference of the International Society for Scientometrics and Informetrics (pp. 284-289). (Proceedings of ISSI 2005: 10th International Conference of the International Society for Scientometrics and Informetrics; Vol. 1).