Simulation of random dendrograms and comparison tests: Some comments

Research output: Contribution to journalArticle

29 Citations (Scopus)

Abstract

It is shown that there is a simple, easily understood alternative to the double permutation algorithm for generating random, fully ranked dendrograms. The paper also examines the utility of five different dendrogram descriptors in statistical analyses of dendrogram similarity. They serve as a logical basis for comparisons under different simulation models: cophenetic difference is valid for weighted dendrograms, partition membership divergence for fully ranked dendrograms, whereas subtree membership divergence and cluster membership divergence are best suited to partially ranked dendrograms. The latter two descriptors possess the ultrametric property for all triples, but are called guasi-ultrametrics because they do not satisfy the identity axiom. The fifth descriptor considered is path difference which is not recommended for comparisons except for unrooted trees. Correlations among dendrogram descriptors are evaluated through simulation experiments, and it is shown that the significance of dendrogram comparisons is greatly influenced by the choice of the descriptor. The paper emphasizes that choice of the underlying tree distribution to be used as a reference in testing significance of a dendrogram comparison measure should be consistent with the descriptor incorporated by that measure.

Original languageEnglish
Pages (from-to)123-142
Number of pages20
JournalJournal of Classification
Volume17
Issue number1
DOIs
Publication statusPublished - Jan 1 2000

    Fingerprint

Keywords

  • Clustering methodology
  • Dendrogram topology
  • Matrix correlation
  • Monte carlo studies

ASJC Scopus subject areas

  • Mathematics (miscellaneous)
  • Psychology (miscellaneous)
  • Statistics, Probability and Uncertainty
  • Library and Information Sciences

Cite this