Module-Based Reinforcement Learning: Experiments with a Real Robot

Zsolt Kalmár, Csaba Szepesvári, A. Lőrincz

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, discrete-time controlled Markov chains with finite state and action spaces. In contrast, robot-learning domains are inherently continuous both in time and space, and moreover are partially observable. Here we suggest a systematic approach to solve such problems in which the available qualitative and quantitative knowledge is used to reduce the complexity of learning task. The steps of the design process are to: i) decompose the task into subtasks using the qualitative knowledge at hand; ii) design local controllers to solve the subtasks using the available quantitative knowledge and iii) learn a coordination of these controllers by means of reinforcement learning. It is argued that the approach enables fast, semi-automatic, but still high quality robot-control as no fine-tuning of the local controllers is needed. The approach was verified on a non-trivial real-life robot task. Several RL algorithms were compared by ANOVA and it was found that the model-based approach worked significantly better than the model-free approach. The learnt switching strategy performed comparably to a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which were not foreseen in advance, thus supporting the view that adaptive algorithms are advantageous to non-adaptive ones in complex environments.

Original languageEnglish
Pages (from-to)55-85
Number of pages31
JournalMachine Learning
Volume31
Issue number1-3
Publication statusPublished - 1998

Fingerprint

Reinforcement learning
Robots
Learning algorithms
Controllers
Robot learning
Experiments
Analysis of variance (ANOVA)
Adaptive algorithms
Markov processes
Tuning

Keywords

  • Feature space
  • Local control
  • Markovian Decision Problems
  • Module-based RL
  • Problem decomposition
  • Reinforcement learning
  • Robot learning
  • Subgoals
  • Switching control

ASJC Scopus subject areas

  • Artificial Intelligence
  • Control and Systems Engineering

Cite this

Module-Based Reinforcement Learning : Experiments with a Real Robot. / Kalmár, Zsolt; Szepesvári, Csaba; Lőrincz, A.

In: Machine Learning, Vol. 31, No. 1-3, 1998, p. 55-85.

Research output: Contribution to journalArticle

Kalmár, Z, Szepesvári, C & Lőrincz, A 1998, 'Module-Based Reinforcement Learning: Experiments with a Real Robot', Machine Learning, vol. 31, no. 1-3, pp. 55-85.
Kalmár, Zsolt ; Szepesvári, Csaba ; Lőrincz, A. / Module-Based Reinforcement Learning : Experiments with a Real Robot. In: Machine Learning. 1998 ; Vol. 31, No. 1-3. pp. 55-85.
@article{75d09d5a61834e288446b4b193025edc,
title = "Module-Based Reinforcement Learning: Experiments with a Real Robot",
abstract = "The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, discrete-time controlled Markov chains with finite state and action spaces. In contrast, robot-learning domains are inherently continuous both in time and space, and moreover are partially observable. Here we suggest a systematic approach to solve such problems in which the available qualitative and quantitative knowledge is used to reduce the complexity of learning task. The steps of the design process are to: i) decompose the task into subtasks using the qualitative knowledge at hand; ii) design local controllers to solve the subtasks using the available quantitative knowledge and iii) learn a coordination of these controllers by means of reinforcement learning. It is argued that the approach enables fast, semi-automatic, but still high quality robot-control as no fine-tuning of the local controllers is needed. The approach was verified on a non-trivial real-life robot task. Several RL algorithms were compared by ANOVA and it was found that the model-based approach worked significantly better than the model-free approach. The learnt switching strategy performed comparably to a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which were not foreseen in advance, thus supporting the view that adaptive algorithms are advantageous to non-adaptive ones in complex environments.",
keywords = "Feature space, Local control, Markovian Decision Problems, Module-based RL, Problem decomposition, Reinforcement learning, Robot learning, Subgoals, Switching control",
author = "Zsolt Kalm{\'a}r and Csaba Szepesv{\'a}ri and A. Lőrincz",
year = "1998",
language = "English",
volume = "31",
pages = "55--85",
journal = "Machine Learning",
issn = "0885-6125",
publisher = "Springer Netherlands",
number = "1-3",

}

TY - JOUR

T1 - Module-Based Reinforcement Learning

T2 - Experiments with a Real Robot

AU - Kalmár, Zsolt

AU - Szepesvári, Csaba

AU - Lőrincz, A.

PY - 1998

Y1 - 1998

N2 - The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, discrete-time controlled Markov chains with finite state and action spaces. In contrast, robot-learning domains are inherently continuous both in time and space, and moreover are partially observable. Here we suggest a systematic approach to solve such problems in which the available qualitative and quantitative knowledge is used to reduce the complexity of learning task. The steps of the design process are to: i) decompose the task into subtasks using the qualitative knowledge at hand; ii) design local controllers to solve the subtasks using the available quantitative knowledge and iii) learn a coordination of these controllers by means of reinforcement learning. It is argued that the approach enables fast, semi-automatic, but still high quality robot-control as no fine-tuning of the local controllers is needed. The approach was verified on a non-trivial real-life robot task. Several RL algorithms were compared by ANOVA and it was found that the model-based approach worked significantly better than the model-free approach. The learnt switching strategy performed comparably to a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which were not foreseen in advance, thus supporting the view that adaptive algorithms are advantageous to non-adaptive ones in complex environments.

AB - The behavior of reinforcement learning (RL) algorithms is best understood in completely observable, discrete-time controlled Markov chains with finite state and action spaces. In contrast, robot-learning domains are inherently continuous both in time and space, and moreover are partially observable. Here we suggest a systematic approach to solve such problems in which the available qualitative and quantitative knowledge is used to reduce the complexity of learning task. The steps of the design process are to: i) decompose the task into subtasks using the qualitative knowledge at hand; ii) design local controllers to solve the subtasks using the available quantitative knowledge and iii) learn a coordination of these controllers by means of reinforcement learning. It is argued that the approach enables fast, semi-automatic, but still high quality robot-control as no fine-tuning of the local controllers is needed. The approach was verified on a non-trivial real-life robot task. Several RL algorithms were compared by ANOVA and it was found that the model-based approach worked significantly better than the model-free approach. The learnt switching strategy performed comparably to a handcrafted version. Moreover, the learnt strategy seemed to exploit certain properties of the environment which were not foreseen in advance, thus supporting the view that adaptive algorithms are advantageous to non-adaptive ones in complex environments.

KW - Feature space

KW - Local control

KW - Markovian Decision Problems

KW - Module-based RL

KW - Problem decomposition

KW - Reinforcement learning

KW - Robot learning

KW - Subgoals

KW - Switching control

UR - http://www.scopus.com/inward/record.url?scp=0032045145&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032045145&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0032045145

VL - 31

SP - 55

EP - 85

JO - Machine Learning

JF - Machine Learning

SN - 0885-6125

IS - 1-3

ER -