Methods of Intrinsic Motivation in Model-based Reinforcement Learning Problems
- Authors: Latyshev A.K.1, Panov A.I.2,3
-
Affiliations:
- Moscow Institute of Physics and Technology (National Research University)
- Federal Research Center “Computer Science and Control” RAS
- Autonomous Non-Profit Organization Artificial Intelligence Research Institute (AIRI)
- Issue: No 3 (2023)
- Pages: 84-97
- Section: Machine Learning, Neural Networks
- URL: https://journal-vniispk.ru/2071-8594/article/view/270351
- DOI: https://doi.org/10.14357/20718594230309
- ID: 270351
Cite item
Full Text
Abstract
The reinforcement learning approach offers a wide range of methods for solving problems of intelligent agents’ control. However, the problem of training an agent from sparse rewards remains relevant. One of the possible solutions is to use methods of intrinsic motivation – an idea came from developmental psychology. Intrinsic motivation explains human behavior in the absence of extrinsic control stimulate. In this article, we review the existing methods of determining intrinsic motivation based on the learned world model. The method systematization consisting of three classes is proposed. These classes differ by the application of the word model to agent components: reward system, exploration policy and intrinsic goals. We present a unified framework for describing the architecture of an agent using a world model and intrinsic motivation to improve learning. The prospects for the development in this field of study are analyzed.
Full Text

About the authors
Artem K. Latyshev
Moscow Institute of Physics and Technology (National Research University)
Email: latyshev.ak@phystech.edu
Graduate student. Engineer
Russian Federation, Dolgoprudny, Moscow RegionAleksandr I. Panov
Federal Research Center “Computer Science and Control” RAS; Autonomous Non-Profit Organization Artificial Intelligence Research Institute (AIRI)
Author for correspondence.
Email: pan@isa.ru
Candidate of physical and mathematical sciences. Leading Researcher; Leading Researcher
Russian Federation, Moscow; MoscowReferences
- Mnih V. et al. Human-level control through deep reinforcement learning // Nature. 2015. V. 518. No 7540. P. 529-533.
- Skrynnik A. et al. Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations // Knowledge-Based Systems. 2021. V. 218. P. 106844.
- Skrynnik A. et al. Hierarchical Deep Q-Network from imperfect demonstrations in Minecraft // Cognitive Systems Research. 2021. V. 65. P. 74–78.
- Silver D. et al. Mastering the game of go without human knowledge // Nature. 2017. V. 550. No 7676. P. 354-359.
- Schulman J. et al. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. Available online: https://arxiv.org/abs/1707.06347 (accessed 26.12.2022).
- Staroverov A., Panov A. Hierarchical Landmark Policy Optimization for Visual Indoor Navigation // IEEE Access. 2022. V. 10. P. 70447–70455.
- Moerland T. M., Broekens J., Jonker C. M. Model-based reinforcement learning: A survey. arXiv 2020, arXiv:2006.16712. Available online: https://arxiv.org/abs/2006.16712 (accessed 26.12.2022).
- Zholus A., Ivchenkov Y., Panov A.I. Addressing Task Prioritization in Model-based Reinforcement Learning // Advances in Neural Computation, Machine Learning, and Cognitive Research VI. NEUROINFORMATICS 2022. Studies in Computational Intelligence. Ed. by Kryzhanovsky B. et al. Springer, Cham, 2023. V. 1064. P. 19–30.
- Ryan R. M., Deci E. L. Intrinsic and extrinsic motivations: Classic definitions and new directions // Contemporary educational psychology. 2000. V. 25. No 1. P. 54-67.
- Oudeyer P. Y., Kaplan F. What is intrinsic motivation? A typology of computational approaches // Frontiers in Neurorobotics. 2007. V. 1.
- Intrinsically motivated learning in natural and artificial systems. Ed. by G. Baldassarre, M. Mirolli. Berlin: Springer, 2013.
- Aubret A., Matignon L., Hassas S. A survey on intrinsic motivation in reinforcement learning. arXiv 2019, arXiv:1908.06976. Available online: https://arxiv.org/abs/1908.06976 (accessed 26.12.2022).
- Aubret A., Matignon L., Hassas S. An information-theoretic perspective on intrinsic motivation in reinforcement learning: a survey. arXiv 2022, arXiv:2209.08890. Available online: https://arxiv.org/abs/2209.08890 (accessed 26.12.2022).
- Ugadiarov L., Skrynnik A., Panov A.I. Long-Term Exploration in Persistent MDPs // Advances in Soft Computing. MICAI 2021. Part I. Lecture Notes in Computer Science. Ed. by Batyrshin I., Gelbukh A., Sidorov G. Springer, 2021. V. 13067. P. 108–120.
- Pathak D. et al. Curiosity-driven exploration by self-supervised prediction. International conference on machine learning // Proceedings of the 34th International Conference on Machine Learning, PMLR. 2017. V. 70. P. 2778-2787.
- Hafez M. B. et al. Deep intrinsically motivated continuous actor-critic for efficient robotic visuomotor skill learning // Paladyn, Journal of Behavioral Robotics. 2019. V. 10. No 1. P. 14-29.
- Mendonca R. et al. Discovering and achieving goals via world models // Advances in Neural Information Processing Systems. 2021. V. 34. P. 24379-24391.
- Pathak D., Gandhi D., Gupta A. Self-supervised exploration via disagreement // Proceedings of the 36th International Conference on Machine Learning, PMLR. 2019. V. 97. P. 5062-5071.
- Shyam P., Jaśkowski W., Gomez F. Model-based active exploration // Proceedings of the 36th International Conference on Machine Learning, PMLR. 2019. V. 97. P. 5779-5788.
- Barto A. G., Simsek O. Intrinsic motivation for reinforcement learning systems // Proceedings of the Thirteenth Yale Workshop on Adaptive and Learning Systems, Yale University Press. 2005. P.113-118.
- Groth O. et al. Is curiosity all you need? On the utility of emergent behaviours from curious exploration. arXiv 2021, arXiv:2109.08603. Available online: https://arxiv.org/abs/2109.08603 (accessed 26.12.2022).
- Kim H. et al. Emi: Exploration with mutual information. arXiv 2018, arXiv:1810.01176. Available online: https://arxiv.org/abs/1810.01176 (accessed 26.12.2022).
- Sekar R. et al. Planning to explore via self-supervised world models // Proceedings of the 37th International Conference on Machine Learning, PMLR. 2020. V. 119. P. 8583-8592.
- Yao Y. et al. Sample Efficient Reinforcement Learning via Model-Ensemble Exploration and Exploitation // 2021 IEEE International Conference on Robotics and Automation (ICRA). 2021. P. 4202-4208.
- Kim K. et al. Active world model learning with progress curiosity // Proceedings of the 37th International Conference on Machine Learning, PMLR. 2020. V. 119. P. 5306-5315.
- Houthooft R. et al. Vime: Variational information maximizing exploration // Advances in neural information processing systems. 2016. V. 29.
- Volpi N. C., Polani D. Goal-directed Empowerment: combining Intrinsic Motivation and Task-oriented Behaviour // IEEE Transactions on Cognitive and Developmental Systems. 2020.
- Mezghani L. et al. Walk the random walk: Learning to discover and reach goals without supervision. arXiv 2022, arXiv:2206.11733. Available online: https://arxiv.org/abs/2206.11733 (accessed 26.12.2022).
- Savinov N. et al. Episodic curiosity through reachability. arXiv 2018, arXiv:1810.02274. Available online: https://arxiv.org/abs/1810.02274 (accessed 26.12.2022).
- Sancaktar C., Blaes S., Martius G. Curious Exploration via Structured World Models Yields Zero-Shot Object Manipulation. arXiv 2022, arXiv:2206.11403. Available online: https://arxiv.org/abs/2206.11403 (accessed 26.12.2022).
- Hafner D. et al. Deep Hierarchical Planning from Pixels. arXiv 2022, arXiv:2206.04114. Available online: https://arxiv.org/abs/2206.04114 (accessed 26.12.2022).
- Nair A. et al. Contextual imagined goals for self-supervised robotic learning // Proceedings of the Conference on Robot Learning, PMLR. 2020. V. 100. P. 530-539.
- Zadaianchuk A., Seitzer M., Martius G. Self-supervised visual reinforcement learning with object-centric representations. arXiv 2020, arXiv:2011.14381. Available online: https://arxiv.org/abs/2011.14381 (accessed 26.12.2022).
- Zadaianchuk A., Martius G., Yang F. Self-supervised Reinforcement Learning with Independently Controllable Subgoals // Proceedings of the 5th Conference on Robot Learning, PMLR. 2022. V. 164. P. 384-394.
- Klyubin A. S., Polani D., Nehaniv C. L. All else being equal be empowered // Advances in Artificial Life. ECAL. Ed. By M.S. Capcarrère, A.A. Freitas, P.J. Bentley, C.G. Johnson, J. Timmis. Lecture Notes in Computer Science, 3630. Berlin: Springer-Heidelberg, 2005. P. 744-753.
- Burda Y. et al. Exploration by random network distillation. arXiv 2018, arXiv:1810.12894. Available online: https://arxiv.org/abs/1810.12894 (accessed 26.12.2022).
- Panov A.I. Goal Setting and Behavior Planning for Cognitive Agents // Scientific and Technical Information Processing. 2019. V. 46. No 6. P. 404–415.
- Bellemare M. G. et al. The arcade learning environment: An evaluation platform for general agents // Journal of Artificial Intelligence Research. 2013. V. 47. P. 253-279.
- Forestier S. et al. Intrinsically motivated goal exploration processes with automatic curriculum learning // Journal of Machine Learning Research. 2022. V. 23. P. 1-41.
- Colas C. et al. Vygotskian Autotelic Artificial Intelligence: Language and Culture Internalization for Human-Like AI. arXiv 2022, arXiv:2206.01134. Available online: https://arxiv.org/abs/2206.01134 (accessed 26.12.2022).
Supplementary files
