Breaking the hierarchy - A new cluster selection mechanism for hierarchical clustering methods

László A. Zahoránszky, Gyula Y. Katona, Péter Hári, A. Málnási-Csizmadia, Katharina A. Zweig, Gergely Zahoránszky-Köhalmi

Research output: Contribution to journalArticle

11 Citations (Scopus)

Abstract

Background: Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method. Results: In this article we provide a general selection scheme, the level independent clustering selection method, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of cohesive clusters. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI) data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection. Conclusion: Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and chemical data sets.

Original languageEnglish
Article number12
JournalAlgorithms for Molecular Biology
Volume4
Issue number1
DOIs
Publication statusPublished - Oct 19 2009

Fingerprint

Hierarchical Clustering
Clustering Methods
Cluster Analysis
Proteins
Clustering
Protein-protein Interaction
Overlapping
Choose
Molecules
Hierarchy
Clustered Data
Number of Clusters
Graph in graph theory
Similarity Measure
Specificity
Drugs
Partition
Datasets
Necessary

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Applied Mathematics
  • Molecular Biology
  • Structural Biology

Cite this

Breaking the hierarchy - A new cluster selection mechanism for hierarchical clustering methods. / Zahoránszky, László A.; Katona, Gyula Y.; Hári, Péter; Málnási-Csizmadia, A.; Zweig, Katharina A.; Zahoránszky-Köhalmi, Gergely.

In: Algorithms for Molecular Biology, Vol. 4, No. 1, 12, 19.10.2009.

Research output: Contribution to journalArticle

Zahoránszky, László A. ; Katona, Gyula Y. ; Hári, Péter ; Málnási-Csizmadia, A. ; Zweig, Katharina A. ; Zahoránszky-Köhalmi, Gergely. / Breaking the hierarchy - A new cluster selection mechanism for hierarchical clustering methods. In: Algorithms for Molecular Biology. 2009 ; Vol. 4, No. 1.
@article{bd1aa63a0e924aa3aaad1522064347b7,
title = "Breaking the hierarchy - A new cluster selection mechanism for hierarchical clustering methods",
abstract = "Background: Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method. Results: In this article we provide a general selection scheme, the level independent clustering selection method, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of cohesive clusters. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI) data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection. Conclusion: Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and chemical data sets.",
author = "Zahor{\'a}nszky, {L{\'a}szl{\'o} A.} and Katona, {Gyula Y.} and P{\'e}ter H{\'a}ri and A. M{\'a}ln{\'a}si-Csizmadia and Zweig, {Katharina A.} and Gergely Zahor{\'a}nszky-K{\"o}halmi",
year = "2009",
month = "10",
day = "19",
doi = "10.1186/1748-7188-4-12",
language = "English",
volume = "4",
journal = "Algorithms for Molecular Biology",
issn = "1748-7188",
publisher = "BioMed Central",
number = "1",

}

TY - JOUR

T1 - Breaking the hierarchy - A new cluster selection mechanism for hierarchical clustering methods

AU - Zahoránszky, László A.

AU - Katona, Gyula Y.

AU - Hári, Péter

AU - Málnási-Csizmadia, A.

AU - Zweig, Katharina A.

AU - Zahoránszky-Köhalmi, Gergely

PY - 2009/10/19

Y1 - 2009/10/19

N2 - Background: Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method. Results: In this article we provide a general selection scheme, the level independent clustering selection method, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of cohesive clusters. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI) data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection. Conclusion: Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and chemical data sets.

AB - Background: Hierarchical clustering methods like Ward's method have been used since decades to understand biological and chemical data sets. In order to get a partition of the data set, it is necessary to choose an optimal level of the hierarchy by a so-called level selection algorithm. In 2005, a new kind of hierarchical clustering method was introduced by Palla et al. that differs in two ways from Ward's method: it can be used on data on which no full similarity matrix is defined and it can produce overlapping clusters, i.e., allow for multiple membership of items in clusters. These features are optimal for biological and chemical data sets but until now no level selection algorithm has been published for this method. Results: In this article we provide a general selection scheme, the level independent clustering selection method, called LInCS. With it, clusters can be selected from any level in quadratic time with respect to the number of clusters. Since hierarchically clustered data is not necessarily associated with a similarity measure, the selection is based on a graph theoretic notion of cohesive clusters. We present results of our method on two data sets, a set of drug like molecules and set of protein-protein interaction (PPI) data. In both cases the method provides a clustering with very good sensitivity and specificity values according to a given reference clustering. Moreover, we can show for the PPI data set that our graph theoretic cohesiveness measure indeed chooses biologically homogeneous clusters and disregards inhomogeneous ones in most cases. We finally discuss how the method can be generalized to other hierarchical clustering methods to allow for a level independent cluster selection. Conclusion: Using our new cluster selection method together with the method by Palla et al. provides a new interesting clustering mechanism that allows to compute overlapping clusters, which is especially valuable for biological and chemical data sets.

UR - http://www.scopus.com/inward/record.url?scp=71149085584&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=71149085584&partnerID=8YFLogxK

U2 - 10.1186/1748-7188-4-12

DO - 10.1186/1748-7188-4-12

M3 - Article

VL - 4

JO - Algorithms for Molecular Biology

JF - Algorithms for Molecular Biology

SN - 1748-7188

IS - 1

M1 - 12

ER -