Modified Gath-Geva clustering for fuzzy segmentation of multivariate time-series

J. Abonyi, Balazs Feil, Sandor Nemeth, Peter Arva

Research output: Contribution to journalArticle

96 Citations (Scopus)

Abstract

Partitioning a time-series into internally homogeneous segments is an important data-mining problem. The changes of the variables of a multivariate time-series are usually vague and do not focus on any particular time point. Therefore, it is not practical to define crisp bounds of the segments. Although fuzzy clustering algorithms are widely used to group overlapping and vague objects, they cannot be directly applied to time-series segmentation, because the clusters need to be contiguous in time. This paper proposes a clustering algorithm for the simultaneous identification of local probabilistic principal component analysis (PPCA) models used to measure the homogeneity of the segments and fuzzy sets used to represent the segments in time. The algorithm favors contiguous clusters in time and is able to detect changes in the hidden structure of multivariate time-series. A fuzzy decision making algorithm based on a compatibility criteria of the clusters has been worked out to determine the required number of segments, while the required number of principal components are determined by the screeplots of the eigenvalues of the fuzzy covariance matrices. The application example shows that this new technique is a useful tool for the analysis of historical process data.

Original languageEnglish
Pages (from-to)39-56
Number of pages18
JournalFuzzy Sets and Systems
Volume149
Issue number1
DOIs
Publication statusPublished - Jan 1 2005

Fingerprint

Multivariate Time Series
Time series
Segmentation
Clustering
Clustering algorithms
Clustering Algorithm
Fuzzy clustering
Fuzzy Decision Making
Fuzzy Matrix
Fuzzy sets
Covariance matrix
Principal component analysis
Probabilistic Analysis
Fuzzy Algorithm
Data mining
Fuzzy Clustering
Principal Components
Homogeneity
Decision making
Compatibility

Keywords

  • Fuzzy clustering
  • Process monitoring
  • Time-series segmentation

ASJC Scopus subject areas

  • Statistics and Probability
  • Electrical and Electronic Engineering
  • Statistics, Probability and Uncertainty
  • Information Systems and Management
  • Computer Vision and Pattern Recognition
  • Computer Science Applications
  • Artificial Intelligence

Cite this

Modified Gath-Geva clustering for fuzzy segmentation of multivariate time-series. / Abonyi, J.; Feil, Balazs; Nemeth, Sandor; Arva, Peter.

In: Fuzzy Sets and Systems, Vol. 149, No. 1, 01.01.2005, p. 39-56.

Research output: Contribution to journalArticle

Abonyi, J. ; Feil, Balazs ; Nemeth, Sandor ; Arva, Peter. / Modified Gath-Geva clustering for fuzzy segmentation of multivariate time-series. In: Fuzzy Sets and Systems. 2005 ; Vol. 149, No. 1. pp. 39-56.
@article{9d2641f5466f46afb633aee42ee753e4,
title = "Modified Gath-Geva clustering for fuzzy segmentation of multivariate time-series",
abstract = "Partitioning a time-series into internally homogeneous segments is an important data-mining problem. The changes of the variables of a multivariate time-series are usually vague and do not focus on any particular time point. Therefore, it is not practical to define crisp bounds of the segments. Although fuzzy clustering algorithms are widely used to group overlapping and vague objects, they cannot be directly applied to time-series segmentation, because the clusters need to be contiguous in time. This paper proposes a clustering algorithm for the simultaneous identification of local probabilistic principal component analysis (PPCA) models used to measure the homogeneity of the segments and fuzzy sets used to represent the segments in time. The algorithm favors contiguous clusters in time and is able to detect changes in the hidden structure of multivariate time-series. A fuzzy decision making algorithm based on a compatibility criteria of the clusters has been worked out to determine the required number of segments, while the required number of principal components are determined by the screeplots of the eigenvalues of the fuzzy covariance matrices. The application example shows that this new technique is a useful tool for the analysis of historical process data.",
keywords = "Fuzzy clustering, Process monitoring, Time-series segmentation",
author = "J. Abonyi and Balazs Feil and Sandor Nemeth and Peter Arva",
year = "2005",
month = "1",
day = "1",
doi = "10.1016/j.fss.2004.07.008",
language = "English",
volume = "149",
pages = "39--56",
journal = "Fuzzy Sets and Systems",
issn = "0165-0114",
publisher = "Elsevier",
number = "1",

}

TY - JOUR

T1 - Modified Gath-Geva clustering for fuzzy segmentation of multivariate time-series

AU - Abonyi, J.

AU - Feil, Balazs

AU - Nemeth, Sandor

AU - Arva, Peter

PY - 2005/1/1

Y1 - 2005/1/1

N2 - Partitioning a time-series into internally homogeneous segments is an important data-mining problem. The changes of the variables of a multivariate time-series are usually vague and do not focus on any particular time point. Therefore, it is not practical to define crisp bounds of the segments. Although fuzzy clustering algorithms are widely used to group overlapping and vague objects, they cannot be directly applied to time-series segmentation, because the clusters need to be contiguous in time. This paper proposes a clustering algorithm for the simultaneous identification of local probabilistic principal component analysis (PPCA) models used to measure the homogeneity of the segments and fuzzy sets used to represent the segments in time. The algorithm favors contiguous clusters in time and is able to detect changes in the hidden structure of multivariate time-series. A fuzzy decision making algorithm based on a compatibility criteria of the clusters has been worked out to determine the required number of segments, while the required number of principal components are determined by the screeplots of the eigenvalues of the fuzzy covariance matrices. The application example shows that this new technique is a useful tool for the analysis of historical process data.

AB - Partitioning a time-series into internally homogeneous segments is an important data-mining problem. The changes of the variables of a multivariate time-series are usually vague and do not focus on any particular time point. Therefore, it is not practical to define crisp bounds of the segments. Although fuzzy clustering algorithms are widely used to group overlapping and vague objects, they cannot be directly applied to time-series segmentation, because the clusters need to be contiguous in time. This paper proposes a clustering algorithm for the simultaneous identification of local probabilistic principal component analysis (PPCA) models used to measure the homogeneity of the segments and fuzzy sets used to represent the segments in time. The algorithm favors contiguous clusters in time and is able to detect changes in the hidden structure of multivariate time-series. A fuzzy decision making algorithm based on a compatibility criteria of the clusters has been worked out to determine the required number of segments, while the required number of principal components are determined by the screeplots of the eigenvalues of the fuzzy covariance matrices. The application example shows that this new technique is a useful tool for the analysis of historical process data.

KW - Fuzzy clustering

KW - Process monitoring

KW - Time-series segmentation

UR - http://www.scopus.com/inward/record.url?scp=9644270328&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=9644270328&partnerID=8YFLogxK

U2 - 10.1016/j.fss.2004.07.008

DO - 10.1016/j.fss.2004.07.008

M3 - Article

VL - 149

SP - 39

EP - 56

JO - Fuzzy Sets and Systems

JF - Fuzzy Sets and Systems

SN - 0165-0114

IS - 1

ER -