Semi-supervised classification of vegetation: Preserving the good old units and searching for new ones

Lubomír Tichý, Milan Chytrý, Z. Botta-Dukát

Research output: Contribution to journalArticle

27 Citations (Scopus)

Abstract

Aim: The unsupervised nature of traditional numerical methods used to classify vegetation hinders the development of comprehensive vegetation classification systems. Each new unsupervised classification yields partitions that are partly inconsistent with previous classifications and change group membership for some sites. In contrast, supervised methods account for previously established vegetation units, but cannot define new ones. Therefore, we introduce the concept of semi-supervised classification to community ecology and vegetation science. Semi-supervised classification formally reproduces the existing units in a supervised mode and simultaneously identifies new units among unassigned sites in an unsupervised mode. We discuss the concept of semi-supervised clustering, introduce semi-supervised variants of two clustering algorithms that produce groups with crisp boundaries, k-means and partitioning around medoids (PAM), provide a free software tool to perform these classifications and demonstrate the advantages using example data sets of vegetation plots. Methods: Semi-supervised methods use a priori information about group membership for some sites to define centroids (k-means) or medoids (PAM) of site groups that represent previously established vegetation units. They identify these groups in a species hyperspace and assign new sites to them. At the same time, they search for a user-defined number of new groups. We compared the unsupervised, supervised and semi-supervised methods using an example of a forest vegetation data set that was previously classified using expert knowledge, and assessed how well these methods reproduced vegetation units defined by experts. Then we compared supervised and semi-supervised methods in a task when a grassland vegetation classification established in one country was extended to two neighbouring countries. Results and conclusions: Example analyses of vegetation plot data sets demonstrated that semi-supervised variants of k-means and PAM are extremely valuable tools for extending existing vegetation classifications while preserving previously defined vegetation units. They can be used both for identifying so far unrecognized vegetation types in the regions where a vegetation classification already exists and for extending a vegetation classification from a particular region to neighbouring regions with partly identical but partly different vegetation types. Both k-means and PAM provide site groups with crisp boundaries, which makes them a simpler alternative to fuzzy clustering methods.

Original languageEnglish
Pages (from-to)1504-1512
Number of pages9
JournalJournal of Vegetation Science
Volume25
Issue number6
DOIs
Publication statusPublished - Nov 1 2014

Fingerprint

image classification
vegetation classification
taxonomy
vegetation
partitioning
vegetation type
methodology
unsupervised classification
community ecology
vegetation types
method
numerical method
grassland
expert opinion
software
grasslands

Keywords

  • Classification stability
  • Clustering
  • Data analysis
  • k-means
  • Partitioning around medoids
  • Phytosociology
  • Plant community ecology
  • Vegetation type

ASJC Scopus subject areas

  • Ecology
  • Plant Science

Cite this

Semi-supervised classification of vegetation : Preserving the good old units and searching for new ones. / Tichý, Lubomír; Chytrý, Milan; Botta-Dukát, Z.

In: Journal of Vegetation Science, Vol. 25, No. 6, 01.11.2014, p. 1504-1512.

Research output: Contribution to journalArticle

@article{45c300c5d7a5467a9812a49181d8cd7f,
title = "Semi-supervised classification of vegetation: Preserving the good old units and searching for new ones",
abstract = "Aim: The unsupervised nature of traditional numerical methods used to classify vegetation hinders the development of comprehensive vegetation classification systems. Each new unsupervised classification yields partitions that are partly inconsistent with previous classifications and change group membership for some sites. In contrast, supervised methods account for previously established vegetation units, but cannot define new ones. Therefore, we introduce the concept of semi-supervised classification to community ecology and vegetation science. Semi-supervised classification formally reproduces the existing units in a supervised mode and simultaneously identifies new units among unassigned sites in an unsupervised mode. We discuss the concept of semi-supervised clustering, introduce semi-supervised variants of two clustering algorithms that produce groups with crisp boundaries, k-means and partitioning around medoids (PAM), provide a free software tool to perform these classifications and demonstrate the advantages using example data sets of vegetation plots. Methods: Semi-supervised methods use a priori information about group membership for some sites to define centroids (k-means) or medoids (PAM) of site groups that represent previously established vegetation units. They identify these groups in a species hyperspace and assign new sites to them. At the same time, they search for a user-defined number of new groups. We compared the unsupervised, supervised and semi-supervised methods using an example of a forest vegetation data set that was previously classified using expert knowledge, and assessed how well these methods reproduced vegetation units defined by experts. Then we compared supervised and semi-supervised methods in a task when a grassland vegetation classification established in one country was extended to two neighbouring countries. Results and conclusions: Example analyses of vegetation plot data sets demonstrated that semi-supervised variants of k-means and PAM are extremely valuable tools for extending existing vegetation classifications while preserving previously defined vegetation units. They can be used both for identifying so far unrecognized vegetation types in the regions where a vegetation classification already exists and for extending a vegetation classification from a particular region to neighbouring regions with partly identical but partly different vegetation types. Both k-means and PAM provide site groups with crisp boundaries, which makes them a simpler alternative to fuzzy clustering methods.",
keywords = "Classification stability, Clustering, Data analysis, k-means, Partitioning around medoids, Phytosociology, Plant community ecology, Vegetation type",
author = "Lubom{\'i}r Tich{\'y} and Milan Chytr{\'y} and Z. Botta-Duk{\'a}t",
year = "2014",
month = "11",
day = "1",
doi = "10.1111/jvs.12193",
language = "English",
volume = "25",
pages = "1504--1512",
journal = "Journal of Vegetation Science",
issn = "1100-9233",
publisher = "Wiley-Blackwell",
number = "6",

}

TY - JOUR

T1 - Semi-supervised classification of vegetation

T2 - Preserving the good old units and searching for new ones

AU - Tichý, Lubomír

AU - Chytrý, Milan

AU - Botta-Dukát, Z.

PY - 2014/11/1

Y1 - 2014/11/1

N2 - Aim: The unsupervised nature of traditional numerical methods used to classify vegetation hinders the development of comprehensive vegetation classification systems. Each new unsupervised classification yields partitions that are partly inconsistent with previous classifications and change group membership for some sites. In contrast, supervised methods account for previously established vegetation units, but cannot define new ones. Therefore, we introduce the concept of semi-supervised classification to community ecology and vegetation science. Semi-supervised classification formally reproduces the existing units in a supervised mode and simultaneously identifies new units among unassigned sites in an unsupervised mode. We discuss the concept of semi-supervised clustering, introduce semi-supervised variants of two clustering algorithms that produce groups with crisp boundaries, k-means and partitioning around medoids (PAM), provide a free software tool to perform these classifications and demonstrate the advantages using example data sets of vegetation plots. Methods: Semi-supervised methods use a priori information about group membership for some sites to define centroids (k-means) or medoids (PAM) of site groups that represent previously established vegetation units. They identify these groups in a species hyperspace and assign new sites to them. At the same time, they search for a user-defined number of new groups. We compared the unsupervised, supervised and semi-supervised methods using an example of a forest vegetation data set that was previously classified using expert knowledge, and assessed how well these methods reproduced vegetation units defined by experts. Then we compared supervised and semi-supervised methods in a task when a grassland vegetation classification established in one country was extended to two neighbouring countries. Results and conclusions: Example analyses of vegetation plot data sets demonstrated that semi-supervised variants of k-means and PAM are extremely valuable tools for extending existing vegetation classifications while preserving previously defined vegetation units. They can be used both for identifying so far unrecognized vegetation types in the regions where a vegetation classification already exists and for extending a vegetation classification from a particular region to neighbouring regions with partly identical but partly different vegetation types. Both k-means and PAM provide site groups with crisp boundaries, which makes them a simpler alternative to fuzzy clustering methods.

AB - Aim: The unsupervised nature of traditional numerical methods used to classify vegetation hinders the development of comprehensive vegetation classification systems. Each new unsupervised classification yields partitions that are partly inconsistent with previous classifications and change group membership for some sites. In contrast, supervised methods account for previously established vegetation units, but cannot define new ones. Therefore, we introduce the concept of semi-supervised classification to community ecology and vegetation science. Semi-supervised classification formally reproduces the existing units in a supervised mode and simultaneously identifies new units among unassigned sites in an unsupervised mode. We discuss the concept of semi-supervised clustering, introduce semi-supervised variants of two clustering algorithms that produce groups with crisp boundaries, k-means and partitioning around medoids (PAM), provide a free software tool to perform these classifications and demonstrate the advantages using example data sets of vegetation plots. Methods: Semi-supervised methods use a priori information about group membership for some sites to define centroids (k-means) or medoids (PAM) of site groups that represent previously established vegetation units. They identify these groups in a species hyperspace and assign new sites to them. At the same time, they search for a user-defined number of new groups. We compared the unsupervised, supervised and semi-supervised methods using an example of a forest vegetation data set that was previously classified using expert knowledge, and assessed how well these methods reproduced vegetation units defined by experts. Then we compared supervised and semi-supervised methods in a task when a grassland vegetation classification established in one country was extended to two neighbouring countries. Results and conclusions: Example analyses of vegetation plot data sets demonstrated that semi-supervised variants of k-means and PAM are extremely valuable tools for extending existing vegetation classifications while preserving previously defined vegetation units. They can be used both for identifying so far unrecognized vegetation types in the regions where a vegetation classification already exists and for extending a vegetation classification from a particular region to neighbouring regions with partly identical but partly different vegetation types. Both k-means and PAM provide site groups with crisp boundaries, which makes them a simpler alternative to fuzzy clustering methods.

KW - Classification stability

KW - Clustering

KW - Data analysis

KW - k-means

KW - Partitioning around medoids

KW - Phytosociology

KW - Plant community ecology

KW - Vegetation type

UR - http://www.scopus.com/inward/record.url?scp=84925298742&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84925298742&partnerID=8YFLogxK

U2 - 10.1111/jvs.12193

DO - 10.1111/jvs.12193

M3 - Article

AN - SCOPUS:84925298742

VL - 25

SP - 1504

EP - 1512

JO - Journal of Vegetation Science

JF - Journal of Vegetation Science

SN - 1100-9233

IS - 6

ER -