Massively distributed concept drift handling in large networks

István Hegeds, Róbert Ormándi, Márk Jelasity

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

Massively distributed data mining in large networks such as smart device platforms and peer-to-peer systems is a rapidly developing research area. One important problem here is concept drift, where global data patterns (movement, preferences, activities, etc.) change according to the actual set of participating users, the weather, the time of day, or as a result of events such as accidents or even natural catastrophes. In an important case-when the network is very large but only a few training samples can be obtained at each node locally-no efficient distributed solution is known that could follow concept drift efficiently. This case is characteristic of smart device platforms where each device stores only one local observation or data record related to a learning problem. Here we present two algorithms to handle concept drift. None of the algorithms collects data to a central location, instead models of the data perform random walks in the network, while being improved using an online learning algorithm. The first algorithm achieves adaptivity by maintaining young as well as old models in the network according to a fixed age distribution. The second one measures the performance of models locally, and discards them if they are judged outdated. We demonstrate through a thorough experimental analysis that our algorithms outperform the known competing methods if the number of independent local samples is limited relative to the speed of drift: a typical scenario in our targeted application domains. The two algorithms have different strengths: while the age distribution approach is very simple and efficient, explicit drift detection can be useful in monitoring applications to trigger control action.

Original languageEnglish
Article number1350021
JournalAdvances in Complex Systems
Volume16
Issue number4-5
DOIs
Publication statusPublished - Aug 2013

    Fingerprint

Keywords

  • Adaptive classification
  • P2P
  • concept drift
  • gossip learning

ASJC Scopus subject areas

  • Control and Systems Engineering

Cite this