Modular reinforcement learning: A case study in a robot domain

Zsolt Kalmár, Csaba Szepesvári, András Lörincz

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

The behaviour of reinforcement learning (RL) algorithms is best understood in completely observable, finite state- and action-space, discrete-time controlled Markov-chains. Robot-learning domains, on the other hand, are inherently infinite both in time and space, and moreover they are only partially observable. In this article we suggest a systematic design method whose motivation comes from the desire to transform the task-to-be-solved into a finite-state, discrete-time, "approximately" Markovian task, which is completely observable, too. The key idea is to break up the problem into subtasks and design controllers for each of the subtasks. Then operating conditions are attached to the controllers (together the controllers and their operating conditions which are called modules) and possible additional features are designed to facilitate observability. A new discrete time-counter is introduced at the "module-level" that clicks only when a change in the value of one of the features is observed. The approach was tried out on a real-life robot. Several RL algorithms were compared and it was found that a model-based approach worked best. The learnt switching strategy performed equally well as a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which could not have been seen in advance, which predicted the promising possibility that a learnt controller might overperform a handcrafted switching strategy in the future.

Original languageEnglish
Pages (from-to)507-522
Number of pages16
JournalActa Cybernetica
Volume14
Issue number3
Publication statusPublished - 1999

Fingerprint

Reinforcement learning
Reinforcement Learning
Discrete-time
Robot
Robots
Controller
Controllers
Learning Algorithm
Controlled Markov Chains
Learning algorithms
Module
Breakup
Robot learning
Observability
Controller Design
Design Method
Markov processes
Transform
Model-based
Strategy

ASJC Scopus subject areas

  • Hardware and Architecture
  • Software

Cite this

Modular reinforcement learning : A case study in a robot domain. / Kalmár, Zsolt; Szepesvári, Csaba; Lörincz, András.

In: Acta Cybernetica, Vol. 14, No. 3, 1999, p. 507-522.

Research output: Contribution to journalArticle

Kalmár, Z, Szepesvári, C & Lörincz, A 1999, 'Modular reinforcement learning: A case study in a robot domain', Acta Cybernetica, vol. 14, no. 3, pp. 507-522.
Kalmár, Zsolt ; Szepesvári, Csaba ; Lörincz, András. / Modular reinforcement learning : A case study in a robot domain. In: Acta Cybernetica. 1999 ; Vol. 14, No. 3. pp. 507-522.
@article{b5da24a34eac4de58efbb1b57da33007,
title = "Modular reinforcement learning: A case study in a robot domain",
abstract = "The behaviour of reinforcement learning (RL) algorithms is best understood in completely observable, finite state- and action-space, discrete-time controlled Markov-chains. Robot-learning domains, on the other hand, are inherently infinite both in time and space, and moreover they are only partially observable. In this article we suggest a systematic design method whose motivation comes from the desire to transform the task-to-be-solved into a finite-state, discrete-time, {"}approximately{"} Markovian task, which is completely observable, too. The key idea is to break up the problem into subtasks and design controllers for each of the subtasks. Then operating conditions are attached to the controllers (together the controllers and their operating conditions which are called modules) and possible additional features are designed to facilitate observability. A new discrete time-counter is introduced at the {"}module-level{"} that clicks only when a change in the value of one of the features is observed. The approach was tried out on a real-life robot. Several RL algorithms were compared and it was found that a model-based approach worked best. The learnt switching strategy performed equally well as a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which could not have been seen in advance, which predicted the promising possibility that a learnt controller might overperform a handcrafted switching strategy in the future.",
author = "Zsolt Kalm{\'a}r and Csaba Szepesv{\'a}ri and Andr{\'a}s L{\"o}rincz",
year = "1999",
language = "English",
volume = "14",
pages = "507--522",
journal = "Acta Cybernetica",
issn = "0324-721X",
publisher = "University of Szeged",
number = "3",

}

TY - JOUR

T1 - Modular reinforcement learning

T2 - A case study in a robot domain

AU - Kalmár, Zsolt

AU - Szepesvári, Csaba

AU - Lörincz, András

PY - 1999

Y1 - 1999

N2 - The behaviour of reinforcement learning (RL) algorithms is best understood in completely observable, finite state- and action-space, discrete-time controlled Markov-chains. Robot-learning domains, on the other hand, are inherently infinite both in time and space, and moreover they are only partially observable. In this article we suggest a systematic design method whose motivation comes from the desire to transform the task-to-be-solved into a finite-state, discrete-time, "approximately" Markovian task, which is completely observable, too. The key idea is to break up the problem into subtasks and design controllers for each of the subtasks. Then operating conditions are attached to the controllers (together the controllers and their operating conditions which are called modules) and possible additional features are designed to facilitate observability. A new discrete time-counter is introduced at the "module-level" that clicks only when a change in the value of one of the features is observed. The approach was tried out on a real-life robot. Several RL algorithms were compared and it was found that a model-based approach worked best. The learnt switching strategy performed equally well as a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which could not have been seen in advance, which predicted the promising possibility that a learnt controller might overperform a handcrafted switching strategy in the future.

AB - The behaviour of reinforcement learning (RL) algorithms is best understood in completely observable, finite state- and action-space, discrete-time controlled Markov-chains. Robot-learning domains, on the other hand, are inherently infinite both in time and space, and moreover they are only partially observable. In this article we suggest a systematic design method whose motivation comes from the desire to transform the task-to-be-solved into a finite-state, discrete-time, "approximately" Markovian task, which is completely observable, too. The key idea is to break up the problem into subtasks and design controllers for each of the subtasks. Then operating conditions are attached to the controllers (together the controllers and their operating conditions which are called modules) and possible additional features are designed to facilitate observability. A new discrete time-counter is introduced at the "module-level" that clicks only when a change in the value of one of the features is observed. The approach was tried out on a real-life robot. Several RL algorithms were compared and it was found that a model-based approach worked best. The learnt switching strategy performed equally well as a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which could not have been seen in advance, which predicted the promising possibility that a learnt controller might overperform a handcrafted switching strategy in the future.

UR - http://www.scopus.com/inward/record.url?scp=30844451786&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=30844451786&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:30844451786

VL - 14

SP - 507

EP - 522

JO - Acta Cybernetica

JF - Acta Cybernetica

SN - 0324-721X

IS - 3

ER -