Fullerene Data Mining Using Bibliometrics and Database Tomography

Ronald N. Kostoff, Tibor Braun, Andras Schubert, Darrell Ray Toothman, James A. Humenik

Research output: Contribution to journalArticle

44 Citations (Scopus)


Database tomography (DT) is a textual database analysis system consisting of two major components: (1) algorithms for extracting multiword phrase frequencies and phrase proximities (physical closeness of the multiword technical phrases) from any type of large textual database, to augment (2) interpretative capabilities of the expert human analyst. DT was used to derive technical intelligence from a fullerenes database derived from the Science Citation Index and the Engineering Compendex. Phrase frequency analysis by the technical domain experts provided the pervasive technical themes of the fullerenes database, and phrase proximity analysis provided the relationships among the pervasive technical themes. Bibliometric analysis of the fullerenes literature supplemented the DT results with author/journal/institution publication and citation data. Comparisons of fullerenes results with past analyses of similarly structured near-earth space, chemistry, hypersonic/supersonic flow, aircraft, and ship hydrodynamics databases are made. One important finding is that many of the normalized bibliometric distribution functions are extremely consistent across these diverse technical domains and could reasonably be expected to apply to broader chemical topics than fullerenes that span multiple structural classes. Finally, lessons learned about integrating the technical domain experts with the data mining tools are presented.

Original languageEnglish
Pages (from-to)19-39
Number of pages21
JournalJournal of Chemical Information and Computer Sciences
Issue number1
Publication statusPublished - Jan 1 2000


ASJC Scopus subject areas

  • Chemistry(all)
  • Information Systems
  • Computer Science Applications
  • Computational Theory and Mathematics

Cite this