Ontologies and tag-statistics

Gergely Tibély, Péter Pollner, T. Vicsek, Gergely Palla

Research output: Contribution to journalArticle

5 Citations (Scopus)

Abstract

Due to the increasing popularity of collaborative tagging systems, the research on tagged networks, hypergraphs, ontologies, folksonomies and other related concepts is becoming an important interdisciplinary area with great potential and relevance for practical applications. In most collaborative tagging systems the tagging by the users is completely 'flat', while in some cases they are allowed to define a shallow hierarchy for their own tags. However, usually no overall hierarchical organization of the tags is given, and one of the interesting challenges of this area is to provide an algorithm generating the ontology of the tags from the available data. In contrast, there are also other types of tagged networks available for research, where the tags are already organized into a directed acyclic graph (DAG), encapsulating the 'is a sub-category of' type of hierarchy between each other. In this paper, we study how this DAG affects the statistical distribution of tags on the nodes marked by the tags in various real networks. The motivation for this research was the fact that understanding the tagging based on a known hierarchy can help in revealing the hidden hierarchy of tags in collaborative tagging systems.We analyse the relation between the tagfrequency and the position of the tag in the DAG in two large sub-networks of the English Wikipedia and a protein-protein interaction network. We also study the tag co-occurrence statistics by introducing a two-dimensional (2D) tag-distance distribution preserving both the difference in the levels and the absolute distance in the DAG for the co-occurring pairs of tags. Our most interesting finding is that the local relevance of tags in the DAG (i.e. their rank or significance as characterized by, e.g., the length of the branches starting from them) is much more important than their global distance from the root. Furthermore, we also introduce a simple tagging model based on random walks on the DAG, capable of reproducing the main statistical features of tag co-occurrence. This model has high potential for further practical applications, e.g., it can provide the starting point for a benchmark system in ontology retrieval or it may help pinpoint unusual correlations in the co-occurrence of tags.

Original languageEnglish
Article number053009
JournalNew Journal of Physics
Volume14
DOIs
Publication statusPublished - May 2012

Fingerprint

marking
statistics
hierarchies
occurrences
proteins
encapsulating
statistical distributions
random walk
preserving
retrieval
interactions

ASJC Scopus subject areas

  • Physics and Astronomy(all)

Cite this

Ontologies and tag-statistics. / Tibély, Gergely; Pollner, Péter; Vicsek, T.; Palla, Gergely.

In: New Journal of Physics, Vol. 14, 053009, 05.2012.

Research output: Contribution to journalArticle

Tibély, Gergely ; Pollner, Péter ; Vicsek, T. ; Palla, Gergely. / Ontologies and tag-statistics. In: New Journal of Physics. 2012 ; Vol. 14.
@article{df45f397fe7a4533ba3cf39528e5d749,
title = "Ontologies and tag-statistics",
abstract = "Due to the increasing popularity of collaborative tagging systems, the research on tagged networks, hypergraphs, ontologies, folksonomies and other related concepts is becoming an important interdisciplinary area with great potential and relevance for practical applications. In most collaborative tagging systems the tagging by the users is completely 'flat', while in some cases they are allowed to define a shallow hierarchy for their own tags. However, usually no overall hierarchical organization of the tags is given, and one of the interesting challenges of this area is to provide an algorithm generating the ontology of the tags from the available data. In contrast, there are also other types of tagged networks available for research, where the tags are already organized into a directed acyclic graph (DAG), encapsulating the 'is a sub-category of' type of hierarchy between each other. In this paper, we study how this DAG affects the statistical distribution of tags on the nodes marked by the tags in various real networks. The motivation for this research was the fact that understanding the tagging based on a known hierarchy can help in revealing the hidden hierarchy of tags in collaborative tagging systems.We analyse the relation between the tagfrequency and the position of the tag in the DAG in two large sub-networks of the English Wikipedia and a protein-protein interaction network. We also study the tag co-occurrence statistics by introducing a two-dimensional (2D) tag-distance distribution preserving both the difference in the levels and the absolute distance in the DAG for the co-occurring pairs of tags. Our most interesting finding is that the local relevance of tags in the DAG (i.e. their rank or significance as characterized by, e.g., the length of the branches starting from them) is much more important than their global distance from the root. Furthermore, we also introduce a simple tagging model based on random walks on the DAG, capable of reproducing the main statistical features of tag co-occurrence. This model has high potential for further practical applications, e.g., it can provide the starting point for a benchmark system in ontology retrieval or it may help pinpoint unusual correlations in the co-occurrence of tags.",
author = "Gergely Tib{\'e}ly and P{\'e}ter Pollner and T. Vicsek and Gergely Palla",
year = "2012",
month = "5",
doi = "10.1088/1367-2630/14/5/053009",
language = "English",
volume = "14",
journal = "New Journal of Physics",
issn = "1367-2630",
publisher = "IOP Publishing Ltd.",

}

TY - JOUR

T1 - Ontologies and tag-statistics

AU - Tibély, Gergely

AU - Pollner, Péter

AU - Vicsek, T.

AU - Palla, Gergely

PY - 2012/5

Y1 - 2012/5

N2 - Due to the increasing popularity of collaborative tagging systems, the research on tagged networks, hypergraphs, ontologies, folksonomies and other related concepts is becoming an important interdisciplinary area with great potential and relevance for practical applications. In most collaborative tagging systems the tagging by the users is completely 'flat', while in some cases they are allowed to define a shallow hierarchy for their own tags. However, usually no overall hierarchical organization of the tags is given, and one of the interesting challenges of this area is to provide an algorithm generating the ontology of the tags from the available data. In contrast, there are also other types of tagged networks available for research, where the tags are already organized into a directed acyclic graph (DAG), encapsulating the 'is a sub-category of' type of hierarchy between each other. In this paper, we study how this DAG affects the statistical distribution of tags on the nodes marked by the tags in various real networks. The motivation for this research was the fact that understanding the tagging based on a known hierarchy can help in revealing the hidden hierarchy of tags in collaborative tagging systems.We analyse the relation between the tagfrequency and the position of the tag in the DAG in two large sub-networks of the English Wikipedia and a protein-protein interaction network. We also study the tag co-occurrence statistics by introducing a two-dimensional (2D) tag-distance distribution preserving both the difference in the levels and the absolute distance in the DAG for the co-occurring pairs of tags. Our most interesting finding is that the local relevance of tags in the DAG (i.e. their rank or significance as characterized by, e.g., the length of the branches starting from them) is much more important than their global distance from the root. Furthermore, we also introduce a simple tagging model based on random walks on the DAG, capable of reproducing the main statistical features of tag co-occurrence. This model has high potential for further practical applications, e.g., it can provide the starting point for a benchmark system in ontology retrieval or it may help pinpoint unusual correlations in the co-occurrence of tags.

AB - Due to the increasing popularity of collaborative tagging systems, the research on tagged networks, hypergraphs, ontologies, folksonomies and other related concepts is becoming an important interdisciplinary area with great potential and relevance for practical applications. In most collaborative tagging systems the tagging by the users is completely 'flat', while in some cases they are allowed to define a shallow hierarchy for their own tags. However, usually no overall hierarchical organization of the tags is given, and one of the interesting challenges of this area is to provide an algorithm generating the ontology of the tags from the available data. In contrast, there are also other types of tagged networks available for research, where the tags are already organized into a directed acyclic graph (DAG), encapsulating the 'is a sub-category of' type of hierarchy between each other. In this paper, we study how this DAG affects the statistical distribution of tags on the nodes marked by the tags in various real networks. The motivation for this research was the fact that understanding the tagging based on a known hierarchy can help in revealing the hidden hierarchy of tags in collaborative tagging systems.We analyse the relation between the tagfrequency and the position of the tag in the DAG in two large sub-networks of the English Wikipedia and a protein-protein interaction network. We also study the tag co-occurrence statistics by introducing a two-dimensional (2D) tag-distance distribution preserving both the difference in the levels and the absolute distance in the DAG for the co-occurring pairs of tags. Our most interesting finding is that the local relevance of tags in the DAG (i.e. their rank or significance as characterized by, e.g., the length of the branches starting from them) is much more important than their global distance from the root. Furthermore, we also introduce a simple tagging model based on random walks on the DAG, capable of reproducing the main statistical features of tag co-occurrence. This model has high potential for further practical applications, e.g., it can provide the starting point for a benchmark system in ontology retrieval or it may help pinpoint unusual correlations in the co-occurrence of tags.

UR - http://www.scopus.com/inward/record.url?scp=84862062822&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84862062822&partnerID=8YFLogxK

U2 - 10.1088/1367-2630/14/5/053009

DO - 10.1088/1367-2630/14/5/053009

M3 - Article

VL - 14

JO - New Journal of Physics

JF - New Journal of Physics

SN - 1367-2630

M1 - 053009

ER -