Constrained representation learning for recurrent policy optimisation under uncertainty

Viet Hung Dang, Ngo Anh Vien*, Tae Choong Chung

*Corresponding author for this work

Research output: Contribution to journalArticle

Abstract

Learning to make decisions in partially observable environments is a notorious problem that requires a complex representation of controllers. In most work, the controllers are designed as a non-linear mapping from a sequence of temporal observations to actions. These problems can, in principle, be formulated as a partially observable Markov decision process whose policy can be parameterised through the use of recurrent neural networks. In this paper, we will propose an alternative framework that (a) uses the Long-Short-Term-Memory (LSTM) Encoder-Decoder framework to learn an internal state representation for historical observations and then (b) integrates it into existing recurrent policy models to improve the task performance. The LSTM Encoder encodes a history of observations as input into a representation of internal states. The LSTM Decoder can perform two alternative decoding tasks: predicting the same input observation sequence or predicting future observation sequences. The first proposed decoder acts like an auto-encoder that will guide and constrain the learning of a useful internal state for the policy optimisation task. The second proposed decoder decodes the learnt internal state by the encoder to predict future observation sequences. This idea makes the network act like a non-linear predictive state representation model. Both these decoding parts, which introduce constraints to policy representation, will help guide both the policy optimisation problem and latent state representation learning. The integration of representation learning and policy optimisation aims to help learn more complex policies and improve the performance of policy learning tasks.

Original languageEnglish
JournalAdaptive Behavior
Early online date30 Dec 2019
DOIs
Publication statusEarly online date - 30 Dec 2019

Keywords

  • auto-encoder
  • partially observable Markov decision process
  • Policy optimisation
  • representation learning

ASJC Scopus subject areas

  • Experimental and Cognitive Psychology
  • Behavioral Neuroscience

Fingerprint Dive into the research topics of 'Constrained representation learning for recurrent policy optimisation under uncertainty'. Together they form a unique fingerprint.

  • Cite this