PIRANHA: Policy iteration for recurrent artificial neural networks with hidden activities

István Szita, A. Lőrincz

Research output: Contribution to journalArticle

1 Citation (Scopus)

Abstract

It is an intriguing task to develop efficient connectionist representations for learning long time series. Recurrent neural networks have great promises here. We model the learning task as a minimization problem of a nonlinear least-squares cost function, that takes into account both one-step and multi-step prediction errors. The special structure of the cost function is constructed to build a bridge to reinforcement learning. We exploit this connection and derive a convergent, policy iteration-based algorithm, and show that RNN training can be made to fit the reinforcement learning framework in a natural fashion. The relevance of this connection is discussed. We also present experimental results, which demonstrate the appealing properties of the unique parameter structure prescribed by reinforcement learning. Experiments cover both sequence learning and long-term prediction.

Original languageEnglish
Pages (from-to)577-591
Number of pages15
JournalNeurocomputing
Volume70
Issue number1-3
DOIs
Publication statusPublished - Dec 2006

Fingerprint

Reinforcement learning
Learning
Neural networks
Cost functions
Recurrent neural networks
Time series
Costs and Cost Analysis
Least-Squares Analysis
Experiments
Reinforcement (Psychology)

Keywords

  • Multi-step prediction
  • Policy iteration
  • Recurrent neural networks
  • Sequence learning

ASJC Scopus subject areas

  • Artificial Intelligence
  • Cellular and Molecular Neuroscience

Cite this

PIRANHA : Policy iteration for recurrent artificial neural networks with hidden activities. / Szita, István; Lőrincz, A.

In: Neurocomputing, Vol. 70, No. 1-3, 12.2006, p. 577-591.

Research output: Contribution to journalArticle

@article{a9cfe26ed79846259ef5082a5a43bec7,
title = "PIRANHA: Policy iteration for recurrent artificial neural networks with hidden activities",
abstract = "It is an intriguing task to develop efficient connectionist representations for learning long time series. Recurrent neural networks have great promises here. We model the learning task as a minimization problem of a nonlinear least-squares cost function, that takes into account both one-step and multi-step prediction errors. The special structure of the cost function is constructed to build a bridge to reinforcement learning. We exploit this connection and derive a convergent, policy iteration-based algorithm, and show that RNN training can be made to fit the reinforcement learning framework in a natural fashion. The relevance of this connection is discussed. We also present experimental results, which demonstrate the appealing properties of the unique parameter structure prescribed by reinforcement learning. Experiments cover both sequence learning and long-term prediction.",
keywords = "Multi-step prediction, Policy iteration, Recurrent neural networks, Sequence learning",
author = "Istv{\'a}n Szita and A. Lőrincz",
year = "2006",
month = "12",
doi = "10.1016/j.neucom.2005.09.017",
language = "English",
volume = "70",
pages = "577--591",
journal = "Neurocomputing",
issn = "0925-2312",
publisher = "Elsevier",
number = "1-3",

}

TY - JOUR

T1 - PIRANHA

T2 - Policy iteration for recurrent artificial neural networks with hidden activities

AU - Szita, István

AU - Lőrincz, A.

PY - 2006/12

Y1 - 2006/12

N2 - It is an intriguing task to develop efficient connectionist representations for learning long time series. Recurrent neural networks have great promises here. We model the learning task as a minimization problem of a nonlinear least-squares cost function, that takes into account both one-step and multi-step prediction errors. The special structure of the cost function is constructed to build a bridge to reinforcement learning. We exploit this connection and derive a convergent, policy iteration-based algorithm, and show that RNN training can be made to fit the reinforcement learning framework in a natural fashion. The relevance of this connection is discussed. We also present experimental results, which demonstrate the appealing properties of the unique parameter structure prescribed by reinforcement learning. Experiments cover both sequence learning and long-term prediction.

AB - It is an intriguing task to develop efficient connectionist representations for learning long time series. Recurrent neural networks have great promises here. We model the learning task as a minimization problem of a nonlinear least-squares cost function, that takes into account both one-step and multi-step prediction errors. The special structure of the cost function is constructed to build a bridge to reinforcement learning. We exploit this connection and derive a convergent, policy iteration-based algorithm, and show that RNN training can be made to fit the reinforcement learning framework in a natural fashion. The relevance of this connection is discussed. We also present experimental results, which demonstrate the appealing properties of the unique parameter structure prescribed by reinforcement learning. Experiments cover both sequence learning and long-term prediction.

KW - Multi-step prediction

KW - Policy iteration

KW - Recurrent neural networks

KW - Sequence learning

UR - http://www.scopus.com/inward/record.url?scp=33750973390&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33750973390&partnerID=8YFLogxK

U2 - 10.1016/j.neucom.2005.09.017

DO - 10.1016/j.neucom.2005.09.017

M3 - Article

AN - SCOPUS:33750973390

VL - 70

SP - 577

EP - 591

JO - Neurocomputing

JF - Neurocomputing

SN - 0925-2312

IS - 1-3

ER -