SENTINEL: Taming Uncertainty with ...
Document type :
Pré-publication ou Document de travail
Permalink :
Title :
SENTINEL: Taming Uncertainty with Ensemble-based Distributional Reinforcement Learning
Author(s) :
Eriksson, Hannes [Auteur]
Chalmers University of Technology [Gothenburg, Sweden]
Basu, Debabrota [Auteur]
Scool [Scool]
Alibeigi, Mina [Auteur]
Dimitrakakis, Christos [Auteur]
University of Oslo [UiO]
Chalmers University of Technology [Gothenburg, Sweden]
Basu, Debabrota [Auteur]
Scool [Scool]
Alibeigi, Mina [Auteur]
Dimitrakakis, Christos [Auteur]
University of Oslo [UiO]
HAL domain(s) :
Informatique [cs]/Intelligence artificielle [cs.AI]
Informatique [cs]/Apprentissage [cs.LG]
Statistiques [stat]/Calcul [stat.CO]
Informatique [cs]/Systèmes et contrôle [cs.SY]
Statistiques [stat]/Applications [stat.AP]
Informatique [cs]/Apprentissage [cs.LG]
Statistiques [stat]/Calcul [stat.CO]
Informatique [cs]/Systèmes et contrôle [cs.SY]
Statistiques [stat]/Applications [stat.AP]
English abstract : [en]
In this paper, we consider risk-sensitive sequential decision-making in model-based reinforcement learning (RL). We introduce a novel quantification of risk, namely composite risk, which takes into ...
Show more >In this paper, we consider risk-sensitive sequential decision-making in model-based reinforcement learning (RL). We introduce a novel quantification of risk, namely composite risk, which takes into account both aleatory and epistemic risk during the learning process.Previous works have considered aleatory or epistemic risk individually, or, an additive combination of the two. We demonstrate that the additive formulation is a particular case of the composite risk, which underestimates the actual CVaR risk even while learning a mixture of Gaussians. In contrast, the composite risk provides a more accurate estimate. We propose to use a bootstrapping method, SENTINEL-K, for distributional RL. SENTINEL-K uses an ensemble of K learners to estimate the return distribution and additionally uses follow the regularized leader (FTRL) from bandit literature for providing a better estimate of the risk on the return distribution. Finally, we experimentally verify that SENTINEL-K estimates the return distribution better, and while used with composite risk estimate,demonstrates better risk-sensitive performance than competing RL algorithms.Show less >
Show more >In this paper, we consider risk-sensitive sequential decision-making in model-based reinforcement learning (RL). We introduce a novel quantification of risk, namely composite risk, which takes into account both aleatory and epistemic risk during the learning process.Previous works have considered aleatory or epistemic risk individually, or, an additive combination of the two. We demonstrate that the additive formulation is a particular case of the composite risk, which underestimates the actual CVaR risk even while learning a mixture of Gaussians. In contrast, the composite risk provides a more accurate estimate. We propose to use a bootstrapping method, SENTINEL-K, for distributional RL. SENTINEL-K uses an ensemble of K learners to estimate the return distribution and additionally uses follow the regularized leader (FTRL) from bandit literature for providing a better estimate of the risk on the return distribution. Finally, we experimentally verify that SENTINEL-K estimates the return distribution better, and while used with composite risk estimate,demonstrates better risk-sensitive performance than competing RL algorithms.Show less >
Language :
Anglais
Collections :
Source :
Submission date :
2021-11-13T02:47:31Z
Files
- https://hal.archives-ouvertes.fr/hal-03150823/document
- Open access
- Access the document