Fullerene Data Mining Using Bibliometrics and Database Tomography

Ronald N. Kostoff, T. Braun, A. Schubert, Darrell Ray Toothman, James A. Humenik

Research output: Contribution to journalArticle

44 Citations (Scopus)

Abstract

Database tomography (DT) is a textual database analysis system consisting of two major components: (1) algorithms for extracting multiword phrase frequencies and phrase proximities (physical closeness of the multiword technical phrases) from any type of large textual database, to augment (2) interpretative capabilities of the expert human analyst. DT was used to derive technical intelligence from a fullerenes database derived from the Science Citation Index and the Engineering Compendex. Phrase frequency analysis by the technical domain experts provided the pervasive technical themes of the fullerenes database, and phrase proximity analysis provided the relationships among the pervasive technical themes. Bibliometric analysis of the fullerenes literature supplemented the DT results with author/journal/institution publication and citation data. Comparisons of fullerenes results with past analyses of similarly structured near-earth space, chemistry, hypersonic/supersonic flow, aircraft, and ship hydrodynamics databases are made. One important finding is that many of the normalized bibliometric distribution functions are extremely consistent across these diverse technical domains and could reasonably be expected to apply to broader chemical topics than fullerenes that span multiple structural classes. Finally, lessons learned about integrating the technical domain experts with the data mining tools are presented.

Original languageEnglish
Pages (from-to)19-39
Number of pages21
JournalJournal of Chemical Information and Computer Sciences
Volume40
Issue number1
Publication statusPublished - Jan 2000

Fingerprint

Fullerenes
Tomography
Data mining
expert
scientific-technical intelligence
frequency analysis
systems analysis
aircraft
chemistry
engineering
science
Supersonic flow
Hypersonic aerodynamics
Distribution functions
Ships
Hydrodynamics
Earth (planet)
Aircraft

ASJC Scopus subject areas

  • Chemistry(all)
  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

Fullerene Data Mining Using Bibliometrics and Database Tomography. / Kostoff, Ronald N.; Braun, T.; Schubert, A.; Toothman, Darrell Ray; Humenik, James A.

In: Journal of Chemical Information and Computer Sciences, Vol. 40, No. 1, 01.2000, p. 19-39.

Research output: Contribution to journalArticle

Kostoff, Ronald N. ; Braun, T. ; Schubert, A. ; Toothman, Darrell Ray ; Humenik, James A. / Fullerene Data Mining Using Bibliometrics and Database Tomography. In: Journal of Chemical Information and Computer Sciences. 2000 ; Vol. 40, No. 1. pp. 19-39.
@article{7db29b554980440bbbca9b819b4bea39,
title = "Fullerene Data Mining Using Bibliometrics and Database Tomography",
abstract = "Database tomography (DT) is a textual database analysis system consisting of two major components: (1) algorithms for extracting multiword phrase frequencies and phrase proximities (physical closeness of the multiword technical phrases) from any type of large textual database, to augment (2) interpretative capabilities of the expert human analyst. DT was used to derive technical intelligence from a fullerenes database derived from the Science Citation Index and the Engineering Compendex. Phrase frequency analysis by the technical domain experts provided the pervasive technical themes of the fullerenes database, and phrase proximity analysis provided the relationships among the pervasive technical themes. Bibliometric analysis of the fullerenes literature supplemented the DT results with author/journal/institution publication and citation data. Comparisons of fullerenes results with past analyses of similarly structured near-earth space, chemistry, hypersonic/supersonic flow, aircraft, and ship hydrodynamics databases are made. One important finding is that many of the normalized bibliometric distribution functions are extremely consistent across these diverse technical domains and could reasonably be expected to apply to broader chemical topics than fullerenes that span multiple structural classes. Finally, lessons learned about integrating the technical domain experts with the data mining tools are presented.",
author = "Kostoff, {Ronald N.} and T. Braun and A. Schubert and Toothman, {Darrell Ray} and Humenik, {James A.}",
year = "2000",
month = "1",
language = "English",
volume = "40",
pages = "19--39",
journal = "Journal of Chemical Information and Modeling",
issn = "1549-9596",
publisher = "American Chemical Society",
number = "1",

}

TY - JOUR

T1 - Fullerene Data Mining Using Bibliometrics and Database Tomography

AU - Kostoff, Ronald N.

AU - Braun, T.

AU - Schubert, A.

AU - Toothman, Darrell Ray

AU - Humenik, James A.

PY - 2000/1

Y1 - 2000/1

N2 - Database tomography (DT) is a textual database analysis system consisting of two major components: (1) algorithms for extracting multiword phrase frequencies and phrase proximities (physical closeness of the multiword technical phrases) from any type of large textual database, to augment (2) interpretative capabilities of the expert human analyst. DT was used to derive technical intelligence from a fullerenes database derived from the Science Citation Index and the Engineering Compendex. Phrase frequency analysis by the technical domain experts provided the pervasive technical themes of the fullerenes database, and phrase proximity analysis provided the relationships among the pervasive technical themes. Bibliometric analysis of the fullerenes literature supplemented the DT results with author/journal/institution publication and citation data. Comparisons of fullerenes results with past analyses of similarly structured near-earth space, chemistry, hypersonic/supersonic flow, aircraft, and ship hydrodynamics databases are made. One important finding is that many of the normalized bibliometric distribution functions are extremely consistent across these diverse technical domains and could reasonably be expected to apply to broader chemical topics than fullerenes that span multiple structural classes. Finally, lessons learned about integrating the technical domain experts with the data mining tools are presented.

AB - Database tomography (DT) is a textual database analysis system consisting of two major components: (1) algorithms for extracting multiword phrase frequencies and phrase proximities (physical closeness of the multiword technical phrases) from any type of large textual database, to augment (2) interpretative capabilities of the expert human analyst. DT was used to derive technical intelligence from a fullerenes database derived from the Science Citation Index and the Engineering Compendex. Phrase frequency analysis by the technical domain experts provided the pervasive technical themes of the fullerenes database, and phrase proximity analysis provided the relationships among the pervasive technical themes. Bibliometric analysis of the fullerenes literature supplemented the DT results with author/journal/institution publication and citation data. Comparisons of fullerenes results with past analyses of similarly structured near-earth space, chemistry, hypersonic/supersonic flow, aircraft, and ship hydrodynamics databases are made. One important finding is that many of the normalized bibliometric distribution functions are extremely consistent across these diverse technical domains and could reasonably be expected to apply to broader chemical topics than fullerenes that span multiple structural classes. Finally, lessons learned about integrating the technical domain experts with the data mining tools are presented.

UR - http://www.scopus.com/inward/record.url?scp=0003079213&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0003079213&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0003079213

VL - 40

SP - 19

EP - 39

JO - Journal of Chemical Information and Modeling

JF - Journal of Chemical Information and Modeling

SN - 1549-9596

IS - 1

ER -