Learning to play using low-complexity rule-based policies: Illustrations through Ms. Pac-Man

István Szita, A. Lőrincz

Research output: Contribution to journalArticle

50 Citations (Scopus)

Abstract

In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. Pac-Man game. We define a set of high-level observation and action modules, from which rule-based policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be either hand-crafted or generated automatically. A suitable selection of rules is learnt by the cross-entropy method, a recent global optimization algorithm that fits our framework smoothly. Cross-entropy-optimized policies perform better than our hand-crafted policy, and reach the score of average human players. We argue that learning is successful mainly because (i) policies may apply concurrent actions and thus the policy space is sufficiently rich, (ii) the search is biased towards low-complexity policies and therefore, solutions with a compact description can be found quickly if they exist.

Original languageEnglish
Pages (from-to)659-684
Number of pages26
JournalJournal of Artificial Intelligence Research
Volume30
Publication statusPublished - Sep 2007

Fingerprint

Entropy
Reinforcement learning
Global optimization

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Artificial Intelligence

Cite this

Learning to play using low-complexity rule-based policies : Illustrations through Ms. Pac-Man. / Szita, István; Lőrincz, A.

In: Journal of Artificial Intelligence Research, Vol. 30, 09.2007, p. 659-684.

Research output: Contribution to journalArticle

@article{0e93b2b09d744dd895c280ccbf9e8632,
title = "Learning to play using low-complexity rule-based policies: Illustrations through Ms. Pac-Man",
abstract = "In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. Pac-Man game. We define a set of high-level observation and action modules, from which rule-based policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be either hand-crafted or generated automatically. A suitable selection of rules is learnt by the cross-entropy method, a recent global optimization algorithm that fits our framework smoothly. Cross-entropy-optimized policies perform better than our hand-crafted policy, and reach the score of average human players. We argue that learning is successful mainly because (i) policies may apply concurrent actions and thus the policy space is sufficiently rich, (ii) the search is biased towards low-complexity policies and therefore, solutions with a compact description can be found quickly if they exist.",
author = "Istv{\'a}n Szita and A. Lőrincz",
year = "2007",
month = "9",
language = "English",
volume = "30",
pages = "659--684",
journal = "Journal of Artificial Intelligence Research",
issn = "1076-9757",
publisher = "Morgan Kaufmann Publishers, Inc.",

}

TY - JOUR

T1 - Learning to play using low-complexity rule-based policies

T2 - Illustrations through Ms. Pac-Man

AU - Szita, István

AU - Lőrincz, A.

PY - 2007/9

Y1 - 2007/9

N2 - In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. Pac-Man game. We define a set of high-level observation and action modules, from which rule-based policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be either hand-crafted or generated automatically. A suitable selection of rules is learnt by the cross-entropy method, a recent global optimization algorithm that fits our framework smoothly. Cross-entropy-optimized policies perform better than our hand-crafted policy, and reach the score of average human players. We argue that learning is successful mainly because (i) policies may apply concurrent actions and thus the policy space is sufficiently rich, (ii) the search is biased towards low-complexity policies and therefore, solutions with a compact description can be found quickly if they exist.

AB - In this article we propose a method that can deal with certain combinatorial reinforcement learning tasks. We demonstrate the approach in the popular Ms. Pac-Man game. We define a set of high-level observation and action modules, from which rule-based policies are constructed automatically. In these policies, actions are temporally extended, and may work concurrently. The policy of the agent is encoded by a compact decision list. The components of the list are selected from a large pool of rules, which can be either hand-crafted or generated automatically. A suitable selection of rules is learnt by the cross-entropy method, a recent global optimization algorithm that fits our framework smoothly. Cross-entropy-optimized policies perform better than our hand-crafted policy, and reach the score of average human players. We argue that learning is successful mainly because (i) policies may apply concurrent actions and thus the policy space is sufficiently rich, (ii) the search is biased towards low-complexity policies and therefore, solutions with a compact description can be found quickly if they exist.

UR - http://www.scopus.com/inward/record.url?scp=38349162555&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=38349162555&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:38349162555

VL - 30

SP - 659

EP - 684

JO - Journal of Artificial Intelligence Research

JF - Journal of Artificial Intelligence Research

SN - 1076-9757

ER -