Gossip learning with linear models on fully distributed data

Rõbert Ormándi, István Hegedus, M. Jelasity

Research output: Contribution to journalArticle

29 Citations (Scopus)

Abstract

Machine learning over fully distributed data poses an important problem in peer-to-peer applications. In this model, we have one data record at each network node but without the possibility to move raw data because of privacy considerations. For example, user profiles, ratings, history, or sensor readings can represent this case. This problem is difficult because there is no possibility to learn local models; the system model offers almost no guarantee for reliability, yet the communication cost needs to be kept low. Here, we propose gossip learning, a generic approach that is based on multiple models taking random walks over the network in parallel, while applying an online learning algorithm to improve themselves, and getting combined via ensemble learning methods. We present an instantiation of this approach for the case of classification with linear models. Our main contribution is an ensemble learning method, which - through the continuous combination of the models in the network - implements a virtual weighted voting mechanism over an exponential number of models at practically no extra cost as compared with independent random walks. We prove the convergence of the method theoretically, and perform extensive experiments on benchmark data sets. Our experimental analysis demonstrates the performance and robustness of the proposed approach.

Original languageEnglish
Pages (from-to)556-571
Number of pages16
JournalConcurrency Computation Practice and Experience
Volume25
Issue number4
DOIs
Publication statusPublished - 2013

Fingerprint

Gossip
Linear Model
Ensemble Learning
Random walk
Model
Online Learning
User Profile
Communication Cost
Online Algorithms
Multiple Models
Experimental Analysis
Peer to Peer
Voting
Privacy
Learning Algorithm
Machine Learning
Learning
Benchmark
Robustness
Learning algorithms

Keywords

  • bagging
  • gossip
  • online learning
  • P2P
  • random walk
  • stochastic gradient descent

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications
  • Software
  • Computational Theory and Mathematics
  • Theoretical Computer Science

Cite this

Gossip learning with linear models on fully distributed data. / Ormándi, Rõbert; Hegedus, István; Jelasity, M.

In: Concurrency Computation Practice and Experience, Vol. 25, No. 4, 2013, p. 556-571.

Research output: Contribution to journalArticle

Ormándi, Rõbert ; Hegedus, István ; Jelasity, M. / Gossip learning with linear models on fully distributed data. In: Concurrency Computation Practice and Experience. 2013 ; Vol. 25, No. 4. pp. 556-571.
@article{c4a9438bc7904f809cc8295e8c3dec8d,
title = "Gossip learning with linear models on fully distributed data",
abstract = "Machine learning over fully distributed data poses an important problem in peer-to-peer applications. In this model, we have one data record at each network node but without the possibility to move raw data because of privacy considerations. For example, user profiles, ratings, history, or sensor readings can represent this case. This problem is difficult because there is no possibility to learn local models; the system model offers almost no guarantee for reliability, yet the communication cost needs to be kept low. Here, we propose gossip learning, a generic approach that is based on multiple models taking random walks over the network in parallel, while applying an online learning algorithm to improve themselves, and getting combined via ensemble learning methods. We present an instantiation of this approach for the case of classification with linear models. Our main contribution is an ensemble learning method, which - through the continuous combination of the models in the network - implements a virtual weighted voting mechanism over an exponential number of models at practically no extra cost as compared with independent random walks. We prove the convergence of the method theoretically, and perform extensive experiments on benchmark data sets. Our experimental analysis demonstrates the performance and robustness of the proposed approach.",
keywords = "bagging, gossip, online learning, P2P, random walk, stochastic gradient descent",
author = "R{\~o}bert Orm{\'a}ndi and Istv{\'a}n Hegedus and M. Jelasity",
year = "2013",
doi = "10.1002/cpe.2858",
language = "English",
volume = "25",
pages = "556--571",
journal = "Concurrency Computation Practice and Experience",
issn = "1532-0626",
publisher = "John Wiley and Sons Ltd",
number = "4",

}

TY - JOUR

T1 - Gossip learning with linear models on fully distributed data

AU - Ormándi, Rõbert

AU - Hegedus, István

AU - Jelasity, M.

PY - 2013

Y1 - 2013

N2 - Machine learning over fully distributed data poses an important problem in peer-to-peer applications. In this model, we have one data record at each network node but without the possibility to move raw data because of privacy considerations. For example, user profiles, ratings, history, or sensor readings can represent this case. This problem is difficult because there is no possibility to learn local models; the system model offers almost no guarantee for reliability, yet the communication cost needs to be kept low. Here, we propose gossip learning, a generic approach that is based on multiple models taking random walks over the network in parallel, while applying an online learning algorithm to improve themselves, and getting combined via ensemble learning methods. We present an instantiation of this approach for the case of classification with linear models. Our main contribution is an ensemble learning method, which - through the continuous combination of the models in the network - implements a virtual weighted voting mechanism over an exponential number of models at practically no extra cost as compared with independent random walks. We prove the convergence of the method theoretically, and perform extensive experiments on benchmark data sets. Our experimental analysis demonstrates the performance and robustness of the proposed approach.

AB - Machine learning over fully distributed data poses an important problem in peer-to-peer applications. In this model, we have one data record at each network node but without the possibility to move raw data because of privacy considerations. For example, user profiles, ratings, history, or sensor readings can represent this case. This problem is difficult because there is no possibility to learn local models; the system model offers almost no guarantee for reliability, yet the communication cost needs to be kept low. Here, we propose gossip learning, a generic approach that is based on multiple models taking random walks over the network in parallel, while applying an online learning algorithm to improve themselves, and getting combined via ensemble learning methods. We present an instantiation of this approach for the case of classification with linear models. Our main contribution is an ensemble learning method, which - through the continuous combination of the models in the network - implements a virtual weighted voting mechanism over an exponential number of models at practically no extra cost as compared with independent random walks. We prove the convergence of the method theoretically, and perform extensive experiments on benchmark data sets. Our experimental analysis demonstrates the performance and robustness of the proposed approach.

KW - bagging

KW - gossip

KW - online learning

KW - P2P

KW - random walk

KW - stochastic gradient descent

UR - http://www.scopus.com/inward/record.url?scp=84874111805&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84874111805&partnerID=8YFLogxK

U2 - 10.1002/cpe.2858

DO - 10.1002/cpe.2858

M3 - Article

VL - 25

SP - 556

EP - 571

JO - Concurrency Computation Practice and Experience

JF - Concurrency Computation Practice and Experience

SN - 1532-0626

IS - 4

ER -