Hybrid clustering for validation and improvement of subject-classification schemes

Frizo Janssens, Lin Zhang, Bart De Moor, Wolfgang Glänzel

Research output: Contribution to journalArticle

66 Citations (Scopus)

Abstract

A hybrid text/citation-based method is used to cluster journals covered by the Web of Science database in the period 2002-2006. The objective is to use this clustering to validate and, if possible, to improve existing journal-based subject-classification schemes. Cross-citation links are determined on an item-by-paper procedure for individual papers assigned to the corresponding journal. Text mining for the textual component is based on the same principle; textual characteristics of individual papers are attributed to the journals in which they have been published. In a first step, the 22-field subject-classification scheme of the Essential Science Indicators (ESI) is evaluated and visualised. In a second step, the hybrid clustering method is applied to classify the about 8300 journals meeting the selection criteria concerning continuity, size and impact. The hybrid method proves superior to its two components when applied separately. The choice of 22 clusters also allows a direct field-to-cluster comparison, and we substantiate that the science areas resulting from cluster analysis form a more coherent structure than the "intellectual" reference scheme, the ESI subject scheme. Moreover, the textual component of the hybrid method allows labelling the clusters using cognitive characteristics, while the citation component allows visualising the cross-citation graph and determining representative journals suggested by the PageRank algorithm. Finally, the analysis of journal 'migration' allows the improvement of existing classification schemes on the basis of the concordance between fields and clusters.

Original languageEnglish
Pages (from-to)683-702
Number of pages20
JournalInformation Processing and Management
Volume45
Issue number6
DOIs
Publication statusPublished - Nov 1 2009

    Fingerprint

Keywords

  • Hybrid clustering
  • Journal cross-citation
  • Mapping of science
  • Subject classification

ASJC Scopus subject areas

  • Information Systems
  • Media Technology
  • Computer Science Applications
  • Management Science and Operations Research
  • Library and Information Sciences

Cite this