OptimClass: Using species-to-cluster fidelity to determine the optimal partition in classification of ecological communities

Lubomír Tichý, Milan Chytrý, Michal Hájek, Stephen S. Talbot, Z. Botta-Dukát

Research output: Contribution to journalArticle

64 Citations (Scopus)

Abstract

Question: Community ecologists are often confronted with multiple possible partitions of a single set of records of species composition and/or abundances from several sites. Different methods of numerical classification produce different results, and the question is which of them, and how many clusters, should be selected for interpretation. We demonstrate a new method for identifying the optimal partition from a series of partitions of the same set of sites, based on number of species with high fidelity to clusters in a partition (faithful species). Methods: The new method, OptimClass, has two variants. OptimClass 1 searches the partition with the maximum number of faithful species across all clusters, while OptimClass 2 searches the partition with the maximum number of clusters that contain at least a preselected minimum number of faithful species. Faithful species are determined based on the P value of the Fisher's exact test, as a measure of fidelity. OptimClass was tested on three vegetation datasets that varied in species richness and internal heterogeneity, using several classification algorithms, resemblance measures and cover transformations. Results: Results from both variants of OptimClass depended on the preselected threshold P value for faithful species: higher P gave higher probability that a partition with more clusters was selected as optimal. Good partitions, in terms of OptimClass criteria, involved flexible beta clustering, and also ordinal clustering. Good partitions were also obtained with TWINSPAN when the required number of clusters was small, or UPGMA when the required number of clusters was large. Poor partitions usually resulted from classifications that used resemblance measures and cover transformations emphasizing differences in species cover; this is not unexpected because OptimClass uses a presence/absence-based fidelity measure. Conclusions: If the aim of a classification is to obtain clusters rich in faithful species, which can be subsequently used as diagnostic species for identification of community types, OptimClass is a suitable method for simultaneous choice of the optimal classification algorithm and optimal number of clusters. It can be computed in the JUICE program.

Original languageEnglish
Pages (from-to)287-299
Number of pages13
JournalJournal of Vegetation Science
Volume21
Issue number2
DOIs
Publication statusPublished - Apr 2010

Fingerprint

taxonomy
species diversity
methodology
ecologists
interspecific variation
ecological community
TWINSPAN
vegetation
species richness
testing
method

Keywords

  • Cluster analysis
  • Cover transformation
  • Dendrogram
  • Optimal number of clusters
  • Ordinal clustering
  • Resemblance measures
  • Stopping rules
  • TWINSPAN

ASJC Scopus subject areas

  • Ecology
  • Plant Science

Cite this

OptimClass : Using species-to-cluster fidelity to determine the optimal partition in classification of ecological communities. / Tichý, Lubomír; Chytrý, Milan; Hájek, Michal; Talbot, Stephen S.; Botta-Dukát, Z.

In: Journal of Vegetation Science, Vol. 21, No. 2, 04.2010, p. 287-299.

Research output: Contribution to journalArticle

Tichý, Lubomír ; Chytrý, Milan ; Hájek, Michal ; Talbot, Stephen S. ; Botta-Dukát, Z. / OptimClass : Using species-to-cluster fidelity to determine the optimal partition in classification of ecological communities. In: Journal of Vegetation Science. 2010 ; Vol. 21, No. 2. pp. 287-299.
@article{780e399bbe0744afa2ac09cb0bc5a3ef,
title = "OptimClass: Using species-to-cluster fidelity to determine the optimal partition in classification of ecological communities",
abstract = "Question: Community ecologists are often confronted with multiple possible partitions of a single set of records of species composition and/or abundances from several sites. Different methods of numerical classification produce different results, and the question is which of them, and how many clusters, should be selected for interpretation. We demonstrate a new method for identifying the optimal partition from a series of partitions of the same set of sites, based on number of species with high fidelity to clusters in a partition (faithful species). Methods: The new method, OptimClass, has two variants. OptimClass 1 searches the partition with the maximum number of faithful species across all clusters, while OptimClass 2 searches the partition with the maximum number of clusters that contain at least a preselected minimum number of faithful species. Faithful species are determined based on the P value of the Fisher's exact test, as a measure of fidelity. OptimClass was tested on three vegetation datasets that varied in species richness and internal heterogeneity, using several classification algorithms, resemblance measures and cover transformations. Results: Results from both variants of OptimClass depended on the preselected threshold P value for faithful species: higher P gave higher probability that a partition with more clusters was selected as optimal. Good partitions, in terms of OptimClass criteria, involved flexible beta clustering, and also ordinal clustering. Good partitions were also obtained with TWINSPAN when the required number of clusters was small, or UPGMA when the required number of clusters was large. Poor partitions usually resulted from classifications that used resemblance measures and cover transformations emphasizing differences in species cover; this is not unexpected because OptimClass uses a presence/absence-based fidelity measure. Conclusions: If the aim of a classification is to obtain clusters rich in faithful species, which can be subsequently used as diagnostic species for identification of community types, OptimClass is a suitable method for simultaneous choice of the optimal classification algorithm and optimal number of clusters. It can be computed in the JUICE program.",
keywords = "Cluster analysis, Cover transformation, Dendrogram, Optimal number of clusters, Ordinal clustering, Resemblance measures, Stopping rules, TWINSPAN",
author = "Lubom{\'i}r Tich{\'y} and Milan Chytr{\'y} and Michal H{\'a}jek and Talbot, {Stephen S.} and Z. Botta-Duk{\'a}t",
year = "2010",
month = "4",
doi = "10.1111/j.1654-1103.2009.01143.x",
language = "English",
volume = "21",
pages = "287--299",
journal = "Journal of Vegetation Science",
issn = "1100-9233",
publisher = "Wiley-Blackwell",
number = "2",

}

TY - JOUR

T1 - OptimClass

T2 - Using species-to-cluster fidelity to determine the optimal partition in classification of ecological communities

AU - Tichý, Lubomír

AU - Chytrý, Milan

AU - Hájek, Michal

AU - Talbot, Stephen S.

AU - Botta-Dukát, Z.

PY - 2010/4

Y1 - 2010/4

N2 - Question: Community ecologists are often confronted with multiple possible partitions of a single set of records of species composition and/or abundances from several sites. Different methods of numerical classification produce different results, and the question is which of them, and how many clusters, should be selected for interpretation. We demonstrate a new method for identifying the optimal partition from a series of partitions of the same set of sites, based on number of species with high fidelity to clusters in a partition (faithful species). Methods: The new method, OptimClass, has two variants. OptimClass 1 searches the partition with the maximum number of faithful species across all clusters, while OptimClass 2 searches the partition with the maximum number of clusters that contain at least a preselected minimum number of faithful species. Faithful species are determined based on the P value of the Fisher's exact test, as a measure of fidelity. OptimClass was tested on three vegetation datasets that varied in species richness and internal heterogeneity, using several classification algorithms, resemblance measures and cover transformations. Results: Results from both variants of OptimClass depended on the preselected threshold P value for faithful species: higher P gave higher probability that a partition with more clusters was selected as optimal. Good partitions, in terms of OptimClass criteria, involved flexible beta clustering, and also ordinal clustering. Good partitions were also obtained with TWINSPAN when the required number of clusters was small, or UPGMA when the required number of clusters was large. Poor partitions usually resulted from classifications that used resemblance measures and cover transformations emphasizing differences in species cover; this is not unexpected because OptimClass uses a presence/absence-based fidelity measure. Conclusions: If the aim of a classification is to obtain clusters rich in faithful species, which can be subsequently used as diagnostic species for identification of community types, OptimClass is a suitable method for simultaneous choice of the optimal classification algorithm and optimal number of clusters. It can be computed in the JUICE program.

AB - Question: Community ecologists are often confronted with multiple possible partitions of a single set of records of species composition and/or abundances from several sites. Different methods of numerical classification produce different results, and the question is which of them, and how many clusters, should be selected for interpretation. We demonstrate a new method for identifying the optimal partition from a series of partitions of the same set of sites, based on number of species with high fidelity to clusters in a partition (faithful species). Methods: The new method, OptimClass, has two variants. OptimClass 1 searches the partition with the maximum number of faithful species across all clusters, while OptimClass 2 searches the partition with the maximum number of clusters that contain at least a preselected minimum number of faithful species. Faithful species are determined based on the P value of the Fisher's exact test, as a measure of fidelity. OptimClass was tested on three vegetation datasets that varied in species richness and internal heterogeneity, using several classification algorithms, resemblance measures and cover transformations. Results: Results from both variants of OptimClass depended on the preselected threshold P value for faithful species: higher P gave higher probability that a partition with more clusters was selected as optimal. Good partitions, in terms of OptimClass criteria, involved flexible beta clustering, and also ordinal clustering. Good partitions were also obtained with TWINSPAN when the required number of clusters was small, or UPGMA when the required number of clusters was large. Poor partitions usually resulted from classifications that used resemblance measures and cover transformations emphasizing differences in species cover; this is not unexpected because OptimClass uses a presence/absence-based fidelity measure. Conclusions: If the aim of a classification is to obtain clusters rich in faithful species, which can be subsequently used as diagnostic species for identification of community types, OptimClass is a suitable method for simultaneous choice of the optimal classification algorithm and optimal number of clusters. It can be computed in the JUICE program.

KW - Cluster analysis

KW - Cover transformation

KW - Dendrogram

KW - Optimal number of clusters

KW - Ordinal clustering

KW - Resemblance measures

KW - Stopping rules

KW - TWINSPAN

UR - http://www.scopus.com/inward/record.url?scp=77949690339&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77949690339&partnerID=8YFLogxK

U2 - 10.1111/j.1654-1103.2009.01143.x

DO - 10.1111/j.1654-1103.2009.01143.x

M3 - Article

AN - SCOPUS:77949690339

VL - 21

SP - 287

EP - 299

JO - Journal of Vegetation Science

JF - Journal of Vegetation Science

SN - 1100-9233

IS - 2

ER -