Bayes-adaptive hierarchical MDPs

Ngo Anh Vien, SeungGwan Lee, TaeChoong Chung

Research output: Contribution to journalArticlepeer-review

5 Citations (Scopus)

Abstract

Reinforcement learning (RL) is an area of machine learning that is concerned with how an agent learns to make decisions sequentially in order to optimize a particular performance measure. For achieving such a goal, the agent has to choose either 1) exploiting previously known knowledge that might end up at local optimality or 2) exploring to gather new knowledge that expects to improve the current performance. Among other RL algorithms, Bayesian model-based RL (BRL) is well-known to be able to trade-off between exploitation and exploration optimally via belief planning, i.e. partially observable Markov decision process (POMDP). However, solving that POMDP often suffers from curse of dimensionality and curse of history. In this paper, we make two major contributions which are: 1) an integration framework of temporal abstraction into BRL that eventually results in a hierarchical POMDP formulation, which can be solved online using a hierarchical sample-based planning solver; 2) a subgoal discovery method for hierarchical BRL that automatically discovers useful macro actions to accelerate learning. In the experiment section, we demonstrate that the proposed approach can scale up to much larger problems. On the other hand, the agent is able to discover useful subgoals for speeding up Bayesian reinforcement learning.
Original languageEnglish
Pages (from-to)112-126
Number of pages15
JournalApplied Intelligence
Volume45
Issue number1
Early online date29 Jan 2016
DOIs
Publication statusPublished - Jul 2016

Fingerprint

Dive into the research topics of 'Bayes-adaptive hierarchical MDPs'. Together they form a unique fingerprint.

Cite this