Adaptive stochastic resource control: A machine learning approach

Balázs Csanád Csáji, L. Monostori

Research output: Contribution to journalArticle

8 Citations (Scopus)

Abstract

The paper investigates stochastic resource allocation problems with scarce, reusable resources and non-preemtive, time-dependent, interconnected tasks. This approach is a natural generalization of several standard resource management problems, such as scheduling and transportation problems. First, reactive solutions are considered and defined as control policies of suitably reformulated Markov decision processes (MDPs). We argue that this reformulation has several favorable properties, such as it has finite state and action spaces, it is aperiodic, hence all policies are proper and the space of control policies can be safely restricted. Next, approximate dynamic programming (ADP) methods, such as fitted Q-learning, are suggested for computing an efficient control policy. In order to compactly maintain the cost-to-go function, two representations are studied: hash tables and support vector regression (SVR), particularly, v-SVRs. Several additional improvements, such as the application of limited-lookahead rollout algorithms in the initial phases, action space decomposition, task clustering and distributed sampling are investigated, too. Finally, experimental results on both benchmark and industry-related data are presented.

Original languageEnglish
Pages (from-to)453-486
Number of pages34
JournalJournal of Artificial Intelligence Research
Volume32
Publication statusPublished - May 2008

Fingerprint

Learning systems
Dynamic programming
Resource allocation
Scheduling
Sampling
Decomposition
Costs
Industry

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Adaptive stochastic resource control : A machine learning approach. / Csáji, Balázs Csanád; Monostori, L.

In: Journal of Artificial Intelligence Research, Vol. 32, 05.2008, p. 453-486.

Research output: Contribution to journalArticle

@article{5499de34a91b492c86656f2021a998cd,
title = "Adaptive stochastic resource control: A machine learning approach",
abstract = "The paper investigates stochastic resource allocation problems with scarce, reusable resources and non-preemtive, time-dependent, interconnected tasks. This approach is a natural generalization of several standard resource management problems, such as scheduling and transportation problems. First, reactive solutions are considered and defined as control policies of suitably reformulated Markov decision processes (MDPs). We argue that this reformulation has several favorable properties, such as it has finite state and action spaces, it is aperiodic, hence all policies are proper and the space of control policies can be safely restricted. Next, approximate dynamic programming (ADP) methods, such as fitted Q-learning, are suggested for computing an efficient control policy. In order to compactly maintain the cost-to-go function, two representations are studied: hash tables and support vector regression (SVR), particularly, v-SVRs. Several additional improvements, such as the application of limited-lookahead rollout algorithms in the initial phases, action space decomposition, task clustering and distributed sampling are investigated, too. Finally, experimental results on both benchmark and industry-related data are presented.",
author = "Cs{\'a}ji, {Bal{\'a}zs Csan{\'a}d} and L. Monostori",
year = "2008",
month = "5",
language = "English",
volume = "32",
pages = "453--486",
journal = "Journal of Artificial Intelligence Research",
issn = "1076-9757",
publisher = "Morgan Kaufmann Publishers, Inc.",

}

TY - JOUR

T1 - Adaptive stochastic resource control

T2 - A machine learning approach

AU - Csáji, Balázs Csanád

AU - Monostori, L.

PY - 2008/5

Y1 - 2008/5

N2 - The paper investigates stochastic resource allocation problems with scarce, reusable resources and non-preemtive, time-dependent, interconnected tasks. This approach is a natural generalization of several standard resource management problems, such as scheduling and transportation problems. First, reactive solutions are considered and defined as control policies of suitably reformulated Markov decision processes (MDPs). We argue that this reformulation has several favorable properties, such as it has finite state and action spaces, it is aperiodic, hence all policies are proper and the space of control policies can be safely restricted. Next, approximate dynamic programming (ADP) methods, such as fitted Q-learning, are suggested for computing an efficient control policy. In order to compactly maintain the cost-to-go function, two representations are studied: hash tables and support vector regression (SVR), particularly, v-SVRs. Several additional improvements, such as the application of limited-lookahead rollout algorithms in the initial phases, action space decomposition, task clustering and distributed sampling are investigated, too. Finally, experimental results on both benchmark and industry-related data are presented.

AB - The paper investigates stochastic resource allocation problems with scarce, reusable resources and non-preemtive, time-dependent, interconnected tasks. This approach is a natural generalization of several standard resource management problems, such as scheduling and transportation problems. First, reactive solutions are considered and defined as control policies of suitably reformulated Markov decision processes (MDPs). We argue that this reformulation has several favorable properties, such as it has finite state and action spaces, it is aperiodic, hence all policies are proper and the space of control policies can be safely restricted. Next, approximate dynamic programming (ADP) methods, such as fitted Q-learning, are suggested for computing an efficient control policy. In order to compactly maintain the cost-to-go function, two representations are studied: hash tables and support vector regression (SVR), particularly, v-SVRs. Several additional improvements, such as the application of limited-lookahead rollout algorithms in the initial phases, action space decomposition, task clustering and distributed sampling are investigated, too. Finally, experimental results on both benchmark and industry-related data are presented.

UR - http://www.scopus.com/inward/record.url?scp=52249117905&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=52249117905&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:52249117905

VL - 32

SP - 453

EP - 486

JO - Journal of Artificial Intelligence Research

JF - Journal of Artificial Intelligence Research

SN - 1076-9757

ER -