A multi-terabyte relational database for geo-tagged social network data

Laszlo Dobos, Janos Szule, Tamas Bodnar, Tamas Hanyecz, Tamas Sebok, Daniel Kondor, Zsofia Kallus, Jozsef Steger, Istvan Csabai, Gabor Vattay

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Citations (Scopus)

Abstract

Despite their relatively low sampling factor, the freely available, randomly sampled status streams of Twitter are very useful sources of geographically embedded social network data. To statistically analyze the information Twitter provides via these streams, we have collected a year's worth of data and built a multi-terabyte relational database from it. The database is designed for fast data loading and to support a wide range of studies focusing on the statistics and geographic features of social networks, as well as on the linguistic analysis of tweets. In this paper we present the method of data collection, the database design, the data loading procedure and special treatment of geo-tagged and multi-lingual data. We also provide some SQL recipes for computing network statistics.

Original languageEnglish
Title of host publication4th IEEE International Conference on Cognitive Infocommunications, CogInfoCom 2013 - Proceedings
PublisherIEEE Computer Society
Pages289-294
Number of pages6
ISBN (Print)9781479915439
DOIs
Publication statusPublished - Jan 1 2013
Event4th IEEE International Conference on Cognitive Infocommunications, CogInfoCom 2013 - Budapest, Hungary
Duration: Dec 2 2013Dec 5 2013

Publication series

Name4th IEEE International Conference on Cognitive Infocommunications, CogInfoCom 2013 - Proceedings

Other

Other4th IEEE International Conference on Cognitive Infocommunications, CogInfoCom 2013
CountryHungary
CityBudapest
Period12/2/1312/5/13

    Fingerprint

ASJC Scopus subject areas

  • Artificial Intelligence
  • Information Systems

Cite this

Dobos, L., Szule, J., Bodnar, T., Hanyecz, T., Sebok, T., Kondor, D., Kallus, Z., Steger, J., Csabai, I., & Vattay, G. (2013). A multi-terabyte relational database for geo-tagged social network data. In 4th IEEE International Conference on Cognitive Infocommunications, CogInfoCom 2013 - Proceedings (pp. 289-294). [6719259] (4th IEEE International Conference on Cognitive Infocommunications, CogInfoCom 2013 - Proceedings). IEEE Computer Society. https://doi.org/10.1109/CogInfoCom.2013.6719259