Assessing the relative importance of methodological decisions in classifications of vegetation data

Attila Lengyel, J. Podaní

Research output: Contribution to journalArticle

10 Citations (Scopus)

Abstract

Questions: What is the relative importance of our methodological decisions concerning sampling (plot size) and data analysis (data transformation, resemblance coefficient, hierarchical clustering strategy and number of clusters) in vegetation classification? Are there differences between the conclusions when the full range or only a more practical narrow range of methodological choices is tested? What is the difference between results for actual and random data? Location: Rock grassland in Hungary. Methods: The full procedure of vegetation classification was simulated using actual and random data. Variation in classification results was partitioned using distance-based redundancy analysis. The RDA models were subjected to variation partitioning to determine the relative importance of methodological decisions. Results: RDA models explained more variation in classifications of random than in real data. Classification algorithm, cluster level, data transformation and mean plot size were always included among the most significant variables, however, the other variables also had a considerable effect in certain situations. Conclusions: As adjusted R2 values suggest, the overall effect of methodological decisions on classifications is larger for randomly structured than actual data, due possibly to a stronger clustering tendency in the latter. The clustering algorithm, cluster level, data transformation and plot size should be chosen most carefully before classification analyses, but any of the examined decisions can significantly affect the result. In addition to the mean, the range of plot sizes should also be carefully delimited during relevé selection for classification studies. The main decision about the classification algorithm is whether a chain-forming or group-forming method is used. The data transformation had a more significant effect on real data than on simulations with random variation, thus supporting the ability of the application of different abundance scales in revealing different facets of biologically relevant patterns in community composition. The resemblance measure had a relatively weak effect, suggesting that it is not as influential as previously thought.

Original languageEnglish
Pages (from-to)804-815
Number of pages12
JournalJournal of Vegetation Science
Volume26
Issue number4
DOIs
Publication statusPublished - Jul 1 2015

Fingerprint

taxonomy
vegetation
vegetation classification
decision
Hungary
community composition
data analysis
partitioning
grasslands
rocks
grassland
effect
sampling
methodology
rock
simulation

Keywords

  • Data transformation
  • Flexible clustering
  • Model selection
  • Multivariate analysis
  • Plot size
  • Resemblance measure

ASJC Scopus subject areas

  • Ecology
  • Plant Science

Cite this

Assessing the relative importance of methodological decisions in classifications of vegetation data. / Lengyel, Attila; Podaní, J.

In: Journal of Vegetation Science, Vol. 26, No. 4, 01.07.2015, p. 804-815.

Research output: Contribution to journalArticle

@article{d8b708676e84422c82484908846547cf,
title = "Assessing the relative importance of methodological decisions in classifications of vegetation data",
abstract = "Questions: What is the relative importance of our methodological decisions concerning sampling (plot size) and data analysis (data transformation, resemblance coefficient, hierarchical clustering strategy and number of clusters) in vegetation classification? Are there differences between the conclusions when the full range or only a more practical narrow range of methodological choices is tested? What is the difference between results for actual and random data? Location: Rock grassland in Hungary. Methods: The full procedure of vegetation classification was simulated using actual and random data. Variation in classification results was partitioned using distance-based redundancy analysis. The RDA models were subjected to variation partitioning to determine the relative importance of methodological decisions. Results: RDA models explained more variation in classifications of random than in real data. Classification algorithm, cluster level, data transformation and mean plot size were always included among the most significant variables, however, the other variables also had a considerable effect in certain situations. Conclusions: As adjusted R2 values suggest, the overall effect of methodological decisions on classifications is larger for randomly structured than actual data, due possibly to a stronger clustering tendency in the latter. The clustering algorithm, cluster level, data transformation and plot size should be chosen most carefully before classification analyses, but any of the examined decisions can significantly affect the result. In addition to the mean, the range of plot sizes should also be carefully delimited during relev{\'e} selection for classification studies. The main decision about the classification algorithm is whether a chain-forming or group-forming method is used. The data transformation had a more significant effect on real data than on simulations with random variation, thus supporting the ability of the application of different abundance scales in revealing different facets of biologically relevant patterns in community composition. The resemblance measure had a relatively weak effect, suggesting that it is not as influential as previously thought.",
keywords = "Data transformation, Flexible clustering, Model selection, Multivariate analysis, Plot size, Resemblance measure",
author = "Attila Lengyel and J. Podan{\'i}",
year = "2015",
month = "7",
day = "1",
doi = "10.1111/jvs.12268",
language = "English",
volume = "26",
pages = "804--815",
journal = "Journal of Vegetation Science",
issn = "1100-9233",
publisher = "Wiley-Blackwell",
number = "4",

}

TY - JOUR

T1 - Assessing the relative importance of methodological decisions in classifications of vegetation data

AU - Lengyel, Attila

AU - Podaní, J.

PY - 2015/7/1

Y1 - 2015/7/1

N2 - Questions: What is the relative importance of our methodological decisions concerning sampling (plot size) and data analysis (data transformation, resemblance coefficient, hierarchical clustering strategy and number of clusters) in vegetation classification? Are there differences between the conclusions when the full range or only a more practical narrow range of methodological choices is tested? What is the difference between results for actual and random data? Location: Rock grassland in Hungary. Methods: The full procedure of vegetation classification was simulated using actual and random data. Variation in classification results was partitioned using distance-based redundancy analysis. The RDA models were subjected to variation partitioning to determine the relative importance of methodological decisions. Results: RDA models explained more variation in classifications of random than in real data. Classification algorithm, cluster level, data transformation and mean plot size were always included among the most significant variables, however, the other variables also had a considerable effect in certain situations. Conclusions: As adjusted R2 values suggest, the overall effect of methodological decisions on classifications is larger for randomly structured than actual data, due possibly to a stronger clustering tendency in the latter. The clustering algorithm, cluster level, data transformation and plot size should be chosen most carefully before classification analyses, but any of the examined decisions can significantly affect the result. In addition to the mean, the range of plot sizes should also be carefully delimited during relevé selection for classification studies. The main decision about the classification algorithm is whether a chain-forming or group-forming method is used. The data transformation had a more significant effect on real data than on simulations with random variation, thus supporting the ability of the application of different abundance scales in revealing different facets of biologically relevant patterns in community composition. The resemblance measure had a relatively weak effect, suggesting that it is not as influential as previously thought.

AB - Questions: What is the relative importance of our methodological decisions concerning sampling (plot size) and data analysis (data transformation, resemblance coefficient, hierarchical clustering strategy and number of clusters) in vegetation classification? Are there differences between the conclusions when the full range or only a more practical narrow range of methodological choices is tested? What is the difference between results for actual and random data? Location: Rock grassland in Hungary. Methods: The full procedure of vegetation classification was simulated using actual and random data. Variation in classification results was partitioned using distance-based redundancy analysis. The RDA models were subjected to variation partitioning to determine the relative importance of methodological decisions. Results: RDA models explained more variation in classifications of random than in real data. Classification algorithm, cluster level, data transformation and mean plot size were always included among the most significant variables, however, the other variables also had a considerable effect in certain situations. Conclusions: As adjusted R2 values suggest, the overall effect of methodological decisions on classifications is larger for randomly structured than actual data, due possibly to a stronger clustering tendency in the latter. The clustering algorithm, cluster level, data transformation and plot size should be chosen most carefully before classification analyses, but any of the examined decisions can significantly affect the result. In addition to the mean, the range of plot sizes should also be carefully delimited during relevé selection for classification studies. The main decision about the classification algorithm is whether a chain-forming or group-forming method is used. The data transformation had a more significant effect on real data than on simulations with random variation, thus supporting the ability of the application of different abundance scales in revealing different facets of biologically relevant patterns in community composition. The resemblance measure had a relatively weak effect, suggesting that it is not as influential as previously thought.

KW - Data transformation

KW - Flexible clustering

KW - Model selection

KW - Multivariate analysis

KW - Plot size

KW - Resemblance measure

UR - http://www.scopus.com/inward/record.url?scp=84931566541&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84931566541&partnerID=8YFLogxK

U2 - 10.1111/jvs.12268

DO - 10.1111/jvs.12268

M3 - Article

VL - 26

SP - 804

EP - 815

JO - Journal of Vegetation Science

JF - Journal of Vegetation Science

SN - 1100-9233

IS - 4

ER -