Methods of Intrinsic Motivation in Model-based Reinforcement Learning Problems

Artem K. Latyshev; Латышев Артем Константинович; Aleksandr I. Panov; Панов Александр Игоревич

doi:10.14357/20718594230309

Methods of Intrinsic Motivation in Model-based Reinforcement Learning Problems

Authors: Latyshev A.K.¹, Panov A.I.²^,3
Affiliations:
1. Moscow Institute of Physics and Technology (National Research University)
2. Federal Research Center “Computer Science and Control” RAS
3. Autonomous Non-Profit Organization Artificial Intelligence Research Institute (AIRI)
Issue: No 3 (2023)
Pages: 84-97
Section: Machine Learning, Neural Networks
URL: https://journal-vniispk.ru/2071-8594/article/view/270351
DOI: https://doi.org/10.14357/20718594230309
ID: 270351

Cite item

Abstract

The reinforcement learning approach offers a wide range of methods for solving problems of intelligent agents’ control. However, the problem of training an agent from sparse rewards remains relevant. One of the possible solutions is to use methods of intrinsic motivation – an idea came from developmental psychology. Intrinsic motivation explains human behavior in the absence of extrinsic control stimulate. In this article, we review the existing methods of determining intrinsic motivation based on the learned world model. The method systematization consisting of three classes is proposed. These classes differ by the application of the word model to agent components: reward system, exploration policy and intrinsic goals. We present a unified framework for describing the architecture of an agent using a world model and intrinsic motivation to improve learning. The prospects for the development in this field of study are analyzed.

Keywords

intrinsic motivation, reinforcement learning, world model, environment exploration

Full Text

About the authors

Artem K. Latyshev

Moscow Institute of Physics and Technology (National Research University)

Email: latyshev.ak@phystech.edu

Graduate student. Engineer

Russian Federation, Dolgoprudny, Moscow Region

Aleksandr I. Panov

Federal Research Center “Computer Science and Control” RAS; Autonomous Non-Profit Organization Artificial Intelligence Research Institute (AIRI)

Author for correspondence.
Email: pan@isa.ru

Candidate of physical and mathematical sciences. Leading Researcher; Leading Researcher

Russian Federation, Moscow; Moscow

References

Mnih V. et al. Human-level control through deep reinforcement learning // Nature. 2015. V. 518. No 7540. P. 529-533.
Skrynnik A. et al. Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations // Knowledge-Based Systems. 2021. V. 218. P. 106844.
Skrynnik A. et al. Hierarchical Deep Q-Network from imperfect demonstrations in Minecraft // Cognitive Systems Research. 2021. V. 65. P. 74–78.
Silver D. et al. Mastering the game of go without human knowledge // Nature. 2017. V. 550. No 7676. P. 354-359.
Schulman J. et al. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. Available online: https://arxiv.org/abs/1707.06347 (accessed 26.12.2022).
Staroverov A., Panov A. Hierarchical Landmark Policy Optimization for Visual Indoor Navigation // IEEE Access. 2022. V. 10. P. 70447–70455.
Moerland T. M., Broekens J., Jonker C. M. Model-based reinforcement learning: A survey. arXiv 2020, arXiv:2006.16712. Available online: https://arxiv.org/abs/2006.16712 (accessed 26.12.2022).
Zholus A., Ivchenkov Y., Panov A.I. Addressing Task Prioritization in Model-based Reinforcement Learning // Advances in Neural Computation, Machine Learning, and Cognitive Research VI. NEUROINFORMATICS 2022. Studies in Computational Intelligence. Ed. by Kryzhanovsky B. et al. Springer, Cham, 2023. V. 1064. P. 19–30.
Ryan R. M., Deci E. L. Intrinsic and extrinsic motivations: Classic definitions and new directions // Contemporary educational psychology. 2000. V. 25. No 1. P. 54-67.
Oudeyer P. Y., Kaplan F. What is intrinsic motivation? A typology of computational approaches // Frontiers in Neurorobotics. 2007. V. 1.
Intrinsically motivated learning in natural and artificial systems. Ed. by G. Baldassarre, M. Mirolli. Berlin: Springer, 2013.
Aubret A., Matignon L., Hassas S. A survey on intrinsic motivation in reinforcement learning. arXiv 2019, arXiv:1908.06976. Available online: https://arxiv.org/abs/1908.06976 (accessed 26.12.2022).
Aubret A., Matignon L., Hassas S. An information-theoretic perspective on intrinsic motivation in reinforcement learning: a survey. arXiv 2022, arXiv:2209.08890. Available online: https://arxiv.org/abs/2209.08890 (accessed 26.12.2022).
Ugadiarov L., Skrynnik A., Panov A.I. Long-Term Exploration in Persistent MDPs // Advances in Soft Computing. MICAI 2021. Part I. Lecture Notes in Computer Science. Ed. by Batyrshin I., Gelbukh A., Sidorov G. Springer, 2021. V. 13067. P. 108–120.
Pathak D. et al. Curiosity-driven exploration by self-supervised prediction. International conference on machine learning // Proceedings of the 34th International Conference on Machine Learning, PMLR. 2017. V. 70. P. 2778-2787.
Hafez M. B. et al. Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning // Paladyn, Journal of Behavioral Robotics. 2019. V. 10. No 1. P. 14-29.
Mendonca R. et al. Discovering and achieving goals via world models // Advances in Neural Information Processing Systems. 2021. V. 34. P. 24379-24391.
Pathak D., Gandhi D., Gupta A. Self-supervised exploration via disagreement // Proceedings of the 36th International Conference on Machine Learning, PMLR. 2019. V. 97. P. 5062-5071.
Shyam P., Jaśkowski W., Gomez F. Model-based active exploration // Proceedings of the 36th International Conference on Machine Learning, PMLR. 2019. V. 97. P. 5779-5788.
Barto A. G., Simsek O. Intrinsic motivation for reinforcement learning systems // Proceedings of the Thirteenth Yale Workshop on Adaptive and Learning Systems, Yale University Press. 2005. P.113-118.
Groth O. et al. Is curiosity all you need? On the utility of emergent behaviours from curious exploration. arXiv 2021, arXiv:2109.08603. Available online: https://arxiv.org/abs/2109.08603 (accessed 26.12.2022).
Kim H. et al. Emi: Exploration with mutual information. arXiv 2018, arXiv:1810.01176. Available online: https://arxiv.org/abs/1810.01176 (accessed 26.12.2022).
Sekar R. et al. Planning to explore via self-supervised world models // Proceedings of the 37th International Conference on Machine Learning, PMLR. 2020. V. 119. P. 8583-8592.
Yao Y. et al. Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation // 2021 IEEE International Conference on Robotics and Automation (ICRA). 2021. P. 4202-4208.
Kim K. et al. Active world model learning with progress curiosity // Proceedings of the 37th International Conference on Machine Learning, PMLR. 2020. V. 119. P. 5306-5315.
Houthooft R. et al. Vime: Variational information maximizing exploration // Advances in neural information processing systems. 2016. V. 29.
Volpi N. C., Polani D. Goal-directed Empowerment: combining Intrinsic Motivation and Task-oriented Behaviour // IEEE Transactions on Cognitive and Developmental Systems. 2020.
Mezghani L. et al. Walk the random walk: Learning to discover and reach goals without supervision. arXiv 2022, arXiv:2206.11733. Available online: https://arxiv.org/abs/2206.11733 (accessed 26.12.2022).
Savinov N. et al. Episodic curiosity through reachability. arXiv 2018, arXiv:1810.02274. Available online: https://arxiv.org/abs/1810.02274 (accessed 26.12.2022).
Sancaktar C., Blaes S., Martius G. Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation. arXiv 2022, arXiv:2206.11403. Available online: https://arxiv.org/abs/2206.11403 (accessed 26.12.2022).
Hafner D. et al. Deep Hierarchical Planning from Pixels. arXiv 2022, arXiv:2206.04114. Available online: https://arxiv.org/abs/2206.04114 (accessed 26.12.2022).
Nair A. et al. Contextual imagined goals for self-supervised robotic learning // Proceedings of the Conference on Robot Learning, PMLR. 2020. V. 100. P. 530-539.
Zadaianchuk A., Seitzer M., Martius G. Self-supervised visual reinforcement learning with object-centric representations. arXiv 2020, arXiv:2011.14381. Available online: https://arxiv.org/abs/2011.14381 (accessed 26.12.2022).
Zadaianchuk A., Martius G., Yang F. Self-supervised Reinforcement Learning with Independently Controllable Subgoals // Proceedings of the 5th Conference on Robot Learning, PMLR. 2022. V. 164. P. 384-394.
Klyubin A. S., Polani D., Nehaniv C. L. All else being equal be empowered // Advances in Artificial Life. ECAL. Ed. By M.S. Capcarrère, A.A. Freitas, P.J. Bentley, C.G. Johnson, J. Timmis. Lecture Notes in Computer Science, 3630. Berlin: Springer-Heidelberg, 2005. P. 744-753.
Burda Y. et al. Exploration by random network distillation. arXiv 2018, arXiv:1810.12894. Available online: https://arxiv.org/abs/1810.12894 (accessed 26.12.2022).
Panov A.I. Goal Setting and Behavior Planning for Cognitive Agents // Scientific and Technical Information Processing. 2019. V. 46. No 6. P. 404–415.
Bellemare M. G. et al. The arcade learning environment: An evaluation platform for general agents // Journal of Artificial Intelligence Research. 2013. V. 47. P. 253-279.
Forestier S. et al. Intrinsically motivated goal exploration processes with automatic curriculum learning // Journal of Machine Learning Research. 2022. V. 23. P. 1-41.
Colas C. et al. Vygotskian Autotelic Artificial Intelligence: Language and Culture Internalization for Human-Like AI. arXiv 2022, arXiv:2206.01134. Available online: https://arxiv.org/abs/2206.01134 (accessed 26.12.2022).

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

2. Fig. 1. Possible ways of connecting states and actions in the model. Sn is a context of n states; Z is a latent space of representations; Mk is an ensemble of k individual models; Ah is a sequence of h actions; , is the state and representation in which the agent will find itself after h steps from the initial

Download (52KB)

Indexing metadata

3. Fig. 2. Levels of an intrinsically motivated agent. At each level, intrinsic motivation methods offer their own analogue: strategies, rewards, or goals

Download (67KB)

Indexing metadata

4. Fig. 3. Collection of training data. The target and exploratory strategies collect training data from the world model and environment into the memory Dg, Dε, DM

Download (24KB)

Indexing metadata

5. Fig. 4. Intrinsic motivation methods explicitly modify the agent's reward (left) and implement an exploratory strategy by explicitly adjusting the agent's strategy (right)

Download (154KB)

Indexing metadata

6. Fig. 5. The world model explicitly defines goals based on the morphology of the environment, and implicitly through the definition of a research strategy, the states achieved by which become goals

Download (70KB)

Indexing metadata

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register