ε-MDPs: Learning in varying environments

István Szita, Bálint Takács, András Lorincz

Research output: Contribution to journalArticle

24 Citations (Scopus)

Abstract

In this paper ε-MDP-models are introduced and convergence theorems are proven using the generalized MDP framework of Szepesvári and Littman. Using this model family, we show that Q-learning is capable of finding near-optimal policies in varying environments. The potential of this new family of MDP models is illustrated via a reinforcement learning algorithm called event-learning which separates the optimization of decision making from the controller. We show that event-learning augmented by a particular controller, which gives rise to an ε-MDP, enables near optimal performance even if considerable and sudden changes may occur in the environment. Illustrations are provided on the two-segment pendulum problem.

Original languageEnglish
Pages (from-to)145-174
Number of pages30
JournalJournal of Machine Learning Research
Volume3
Issue number1
DOIs
Publication statusPublished - Jan 1 2003

Keywords

  • Convergence
  • Event-learning
  • Generalized MDP
  • MDP
  • Reinforcement learning
  • SARSA
  • SDS controller
  • ε-MDP

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Statistics and Probability
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'ε-MDPs: Learning in varying environments'. Together they form a unique fingerprint.

  • Cite this