In this paper, we propose to use hierarchical action decomposition to make Bayesian model-based reinforcement learning more efficient and feasible in practice. We formulate Bayesian hierarchical reinforcement learning as a partially observable semi-Markov decision process (POSMDP). The main POSMDP task is partitioned into a hierarchy of POSMDP subtasks; lower-level subtasks get solved first, then higher-level ones. We sample from a prior belief to build an approximate model for each POSMDP, then solve using Monte Carlo Value Iteration with Macro-Actions solver. Experimental results show that our algorithm performs significantly better than that of flat BRL in terms of both reward, and especially solving time, in at least one order of magnitude.
|Title of host publication||International conference on Autonomous Agents and Multi-Agent Systems, AAMAS '14, Paris, France, May 5-9, 2014|
|Subtitle of host publication||AAMAS|
|Number of pages||2|
|Publication status||Published - 2014|