Database tomography (DT) is a textual database analysis system consisting of two major components: (1) algorithms for extracting multiword phrase frequencies and phrase proximities (physical closeness of the multiword technical phrases) from any type of large textual database, to augment (2) interpretative capabilities of the expert human analyst. DT was used to derive technical intelligence from a fullerenes database derived from the Science Citation Index and the Engineering Compendex. Phrase frequency analysis by the technical domain experts provided the pervasive technical themes of the fullerenes database, and phrase proximity analysis provided the relationships among the pervasive technical themes. Bibliometric analysis of the fullerenes literature supplemented the DT results with author/journal/institution publication and citation data. Comparisons of fullerenes results with past analyses of similarly structured near-earth space, chemistry, hypersonic/supersonic flow, aircraft, and ship hydrodynamics databases are made. One important finding is that many of the normalized bibliometric distribution functions are extremely consistent across these diverse technical domains and could reasonably be expected to apply to broader chemical topics than fullerenes that span multiple structural classes. Finally, lessons learned about integrating the technical domain experts with the data mining tools are presented.
|Number of pages||21|
|Journal||Journal of Chemical Information and Computer Sciences|
|Publication status||Published - Jan 1 2000|
ASJC Scopus subject areas
- Information Systems
- Computer Science Applications
- Computational Theory and Mathematics