Joint optimization of cluster number and abundance transformation for obtaining effective vegetation classifications

Attila Lengyel, Flavia Landucci, Ladislav Mucina, James L. Tsakalos, Z. Botta-Dukát

Research output: Contribution to journalArticle

3 Citations (Scopus)

Abstract

Question: Is it possible to determine which combination of cluster number and taxon abundance transformation would produce the most effective classification of vegetation data? What is the effect of changing cluster number and taxon abundance weighting (applied simultaneously) on the stability and biological interpretation of vegetation classifications? Locality: Europe, Western Australia, simulated data. Methods: Real data sets representing Hungarian sub-montane grasslands, European wetlands, and Western Australian kwongan vegetation, as well as simulated data sets were used. The data sets were classified using the partitioning around medoids method. We generated classification solutions by gradually changing the transformation exponent applied to the species projected covers and the number of clusters. The effectiveness of each classification was assessed with a stability index. This index is based on bootstrap resampling of the original data set with subsequent elimination of duplicates. The vegetation types delimited by the most stable classification were compared with other classifications obtained at local maxima of the stability values. The effect of changing the transformation power exponent on the number of clusters, indexed according to their stability, was evaluated. Results: The optimal number of clusters varied with the power exponent in all cases, both with real and simulated data sets. With the real data sets, optimal cluster numbers obtained with different data transformations recovered interpretable biological patterns. Using the simulated data, the optima of stability values identified the simulated number of clusters correctly in most cases. Conclusions: With changing the settings of data transformation and the number of clusters, classifications of different stability can be produced. Highly stable classifications can be obtained from different settings for cluster number and data transformation. Despite similarly high stability, such classifications may reveal contrasting biological patterns, thus suggesting different interpretations. We suggest testing a wide range of available combinations to find the parameters resulting in the most effective classifications.

Original languageEnglish
JournalJournal of Vegetation Science
DOIs
Publication statusAccepted/In press - Jan 1 2018

Fingerprint

vegetation classification
taxonomy
vegetation
vegetation type
vegetation types
Western Australia
partitioning
grassland
wetland
wetlands
grasslands

Keywords

  • Cluster validation
  • Clustering
  • Community similarity
  • Cover scale
  • Data type
  • Multivariate data analysis
  • Numerical classification
  • Stability of classification

ASJC Scopus subject areas

  • Ecology
  • Plant Science

Cite this

Joint optimization of cluster number and abundance transformation for obtaining effective vegetation classifications. / Lengyel, Attila; Landucci, Flavia; Mucina, Ladislav; Tsakalos, James L.; Botta-Dukát, Z.

In: Journal of Vegetation Science, 01.01.2018.

Research output: Contribution to journalArticle

@article{f70e1d8ffbbd43e4a3f22cb9622cd2a7,
title = "Joint optimization of cluster number and abundance transformation for obtaining effective vegetation classifications",
abstract = "Question: Is it possible to determine which combination of cluster number and taxon abundance transformation would produce the most effective classification of vegetation data? What is the effect of changing cluster number and taxon abundance weighting (applied simultaneously) on the stability and biological interpretation of vegetation classifications? Locality: Europe, Western Australia, simulated data. Methods: Real data sets representing Hungarian sub-montane grasslands, European wetlands, and Western Australian kwongan vegetation, as well as simulated data sets were used. The data sets were classified using the partitioning around medoids method. We generated classification solutions by gradually changing the transformation exponent applied to the species projected covers and the number of clusters. The effectiveness of each classification was assessed with a stability index. This index is based on bootstrap resampling of the original data set with subsequent elimination of duplicates. The vegetation types delimited by the most stable classification were compared with other classifications obtained at local maxima of the stability values. The effect of changing the transformation power exponent on the number of clusters, indexed according to their stability, was evaluated. Results: The optimal number of clusters varied with the power exponent in all cases, both with real and simulated data sets. With the real data sets, optimal cluster numbers obtained with different data transformations recovered interpretable biological patterns. Using the simulated data, the optima of stability values identified the simulated number of clusters correctly in most cases. Conclusions: With changing the settings of data transformation and the number of clusters, classifications of different stability can be produced. Highly stable classifications can be obtained from different settings for cluster number and data transformation. Despite similarly high stability, such classifications may reveal contrasting biological patterns, thus suggesting different interpretations. We suggest testing a wide range of available combinations to find the parameters resulting in the most effective classifications.",
keywords = "Cluster validation, Clustering, Community similarity, Cover scale, Data type, Multivariate data analysis, Numerical classification, Stability of classification",
author = "Attila Lengyel and Flavia Landucci and Ladislav Mucina and Tsakalos, {James L.} and Z. Botta-Duk{\'a}t",
year = "2018",
month = "1",
day = "1",
doi = "10.1111/jvs.12604",
language = "English",
journal = "Journal of Vegetation Science",
issn = "1100-9233",
publisher = "Wiley-Blackwell",

}

TY - JOUR

T1 - Joint optimization of cluster number and abundance transformation for obtaining effective vegetation classifications

AU - Lengyel, Attila

AU - Landucci, Flavia

AU - Mucina, Ladislav

AU - Tsakalos, James L.

AU - Botta-Dukát, Z.

PY - 2018/1/1

Y1 - 2018/1/1

N2 - Question: Is it possible to determine which combination of cluster number and taxon abundance transformation would produce the most effective classification of vegetation data? What is the effect of changing cluster number and taxon abundance weighting (applied simultaneously) on the stability and biological interpretation of vegetation classifications? Locality: Europe, Western Australia, simulated data. Methods: Real data sets representing Hungarian sub-montane grasslands, European wetlands, and Western Australian kwongan vegetation, as well as simulated data sets were used. The data sets were classified using the partitioning around medoids method. We generated classification solutions by gradually changing the transformation exponent applied to the species projected covers and the number of clusters. The effectiveness of each classification was assessed with a stability index. This index is based on bootstrap resampling of the original data set with subsequent elimination of duplicates. The vegetation types delimited by the most stable classification were compared with other classifications obtained at local maxima of the stability values. The effect of changing the transformation power exponent on the number of clusters, indexed according to their stability, was evaluated. Results: The optimal number of clusters varied with the power exponent in all cases, both with real and simulated data sets. With the real data sets, optimal cluster numbers obtained with different data transformations recovered interpretable biological patterns. Using the simulated data, the optima of stability values identified the simulated number of clusters correctly in most cases. Conclusions: With changing the settings of data transformation and the number of clusters, classifications of different stability can be produced. Highly stable classifications can be obtained from different settings for cluster number and data transformation. Despite similarly high stability, such classifications may reveal contrasting biological patterns, thus suggesting different interpretations. We suggest testing a wide range of available combinations to find the parameters resulting in the most effective classifications.

AB - Question: Is it possible to determine which combination of cluster number and taxon abundance transformation would produce the most effective classification of vegetation data? What is the effect of changing cluster number and taxon abundance weighting (applied simultaneously) on the stability and biological interpretation of vegetation classifications? Locality: Europe, Western Australia, simulated data. Methods: Real data sets representing Hungarian sub-montane grasslands, European wetlands, and Western Australian kwongan vegetation, as well as simulated data sets were used. The data sets were classified using the partitioning around medoids method. We generated classification solutions by gradually changing the transformation exponent applied to the species projected covers and the number of clusters. The effectiveness of each classification was assessed with a stability index. This index is based on bootstrap resampling of the original data set with subsequent elimination of duplicates. The vegetation types delimited by the most stable classification were compared with other classifications obtained at local maxima of the stability values. The effect of changing the transformation power exponent on the number of clusters, indexed according to their stability, was evaluated. Results: The optimal number of clusters varied with the power exponent in all cases, both with real and simulated data sets. With the real data sets, optimal cluster numbers obtained with different data transformations recovered interpretable biological patterns. Using the simulated data, the optima of stability values identified the simulated number of clusters correctly in most cases. Conclusions: With changing the settings of data transformation and the number of clusters, classifications of different stability can be produced. Highly stable classifications can be obtained from different settings for cluster number and data transformation. Despite similarly high stability, such classifications may reveal contrasting biological patterns, thus suggesting different interpretations. We suggest testing a wide range of available combinations to find the parameters resulting in the most effective classifications.

KW - Cluster validation

KW - Clustering

KW - Community similarity

KW - Cover scale

KW - Data type

KW - Multivariate data analysis

KW - Numerical classification

KW - Stability of classification

UR - http://www.scopus.com/inward/record.url?scp=85042097350&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85042097350&partnerID=8YFLogxK

U2 - 10.1111/jvs.12604

DO - 10.1111/jvs.12604

M3 - Article

AN - SCOPUS:85042097350

JO - Journal of Vegetation Science

JF - Journal of Vegetation Science

SN - 1100-9233

ER -