HEVERL – Viewport Estimation Using Reinforcement Learning for 360-degree Video Streaming

N. Hung; Хунг Н.; P. T Dat; Дат Ф. Т; N. Tan; Тан Н.; N. A Quan; Куан Н. А; L. Trang; Транг Л.; L. M Nam; Нам Л. М

doi:10.15622/ia.24.1.11

HEVERL – Viewport Estimation Using Reinforcement Learning for 360-degree Video Streaming

Authors: Hung N.¹, Dat P.T¹, Tan N.¹, Quan N.A¹, Trang L.¹, Nam L.M¹
Affiliations:
1. East Asia University of Technology
Issue: Vol 24, No 1 (2025)
Pages: 302-328
Section: Artificial intelligence, knowledge and data engineering
URL: https://journal-vniispk.ru/2713-3192/article/view/278230
DOI: https://doi.org/10.15622/ia.24.1.11
ID: 278230

Cite item

Full Text

Abstract
About the authors
References
Supplementary files
Statistics

Abstract

360-degree video content has become a pivotal component in virtual reality environments, offering viewers an immersive and engaging experience. However, streaming such comprehensive video content presents significant challenges due to the substantial file sizes and varying network conditions. To address these challenges, view adaptive streaming has emerged as a promising solution, aimed at reducing the burden on network capacity. This technique involves streaming lower-quality video for peripheral views while delivering high-quality content for the specific viewport that the user is actively watching. Essentially, it necessitates accurately predicting the user’s viewing direction and enhancing the quality of that particular segment, underscoring the significance of Viewport Adaptive Streaming (VAS). Our research delves into the application of incremental learning techniques to predict the scores required by the VAS system. By doing so, we aim to optimize the streaming process by ensuring that the most relevant portions of the video are rendered in high quality. Furthermore, our approach is augmented by a thorough analysis of human head and facial movement behaviors. By leveraging these insights, we have developed a reinforcement learning model specifically designed to anticipate user view directions and improve the experience quality in targeted regions. The effectiveness of our proposed method is evidenced by our experimental results, which show significant improvements over existing reference methods. Specifically, our approach enhances the Precision metric by values ranging from 0.011 to 0.022. Additionally, it reduces the Root Mean Square Error (RMSE) by 0.008 to 0.013, the Mean Absolute Error (MAE) by 0.012 to 0.018 and the F1-score by 0.017 to 0.028. Furthermore, we observe an increase in overall accuracy of 2.79 to 16.98. These improvements highlight the potential of our model to significantly enhance the viewing experience in virtual reality environments, making 360-degree video streaming more efficient and user-friendly.

Keywords

head-eye movement, reinforcement learning, deep learning, machine learning, video streaming, 360-degree video

References

Pan X., Chen X., Zhang Q., Li N. Model predictive control: A rein-forcement learning-based approach. Journal of Physics: Conference Series. IOP Publishing. 2022. vol. 2203. no. 1. doi: 10.1088/1742-6596/2203/1/012058.
Feng X., Swaminathan V., Wei S. Viewport prediction for live 360-degree mobile video streaming using user-content hybrid motion tracking. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies. 2019. vol. 3. no. 2. pp. 1–22. doi: 10.1145/3328914.
Nguyen H., Dao T.N., Pham N.S., Dang T.L., Nguyen T.D., Truong T.H. An accurate viewport estimation method for 360 video streaming using deep learning. EAI Endorsed Transactions on Industrial Networks and Intelligent Systems. 2022. vol. 9. no. 4. doi: 10.4108/eetinis.v9i4.2218.
Nguyen D. An evaluation of viewport estimation methods in 360-degree video streaming. 7th International Conference on Business and Industrial Research (ICBIR). IEEE, 2022. pp. 161–166. doi: 10.1109/ICBIR54589.2022.9786513.
Nguyen V.H., Pham N.N., Truong C.T., Bui D.T., Nguyen H.T., Truong T.H. Retina-based quality assessment of tile-coded 360-degree videos. EAI Endorsed Transactions on Industrial Networks and Intelligent Systems. 2022. vol. 9. no. 32. doi: 10.4108/eetinis.v9i32.1058.
Lee E.-J., Jang Y.J., Chung M. When and how user comments affect news readers’ personal opinion: perceived public opinion and perceived news position as mediators. Digital Journalism. 2020. vol. 9. no. 1. pp. 42–63. doi: 10.1080/21670811.2020.1837638.
Nguyen H.V., Tan N., Quan N.H., Huong T.T., Phat N.H. Building a chatbot system to analyze opinions of english comments. Informatics and Automation. 2023. vol. 22. no. 2. pp. 289–315.
Raja U.S., Carrico A.R. A qualitative exploration of individual experiences of environmental virtual reality through the lens of psychological distance. Environmental Communication. 2021. vol. 15. no. 5. pp. 594–609. doi: 10.1080/17524032.2020.1871052.
Jiang Z., Zhang X., Xu Y., Ma Z., Sun J., Zhang Y. Reinforcement learning based rate adaptation for 360-degree video streaming. IEEE Transactions on Broadcasting. 2021. vol. 67. no. 2. pp. 409–423. doi: 10.1109/TBC.2020.3028286.
Nguyen V.H., Bui D.T., Tran T.L., Truong C.T., Truong T.H. Scalable and resilient 360-degree-video adaptive streaming over http/2against sudden network drops. Computer Communications. 2024. vol. 216. pp. 1–15. doi: 10.1016/j.comcom.2024.01.001.
Kan N., Zou J., Li C., Dai W., Xiong H. Rapt360: Reinforcement learning-based rate adaptation for 360-degree video streaming with adaptive prediction and tiling. IEEE Transactions on Circuits and Systems for Video Technology. 2022. vol. 32. no. 3. pp. 1607–1623. doi: 10.1109/TCSVT.2021.3076585.
Hung N.V., Chien T.D., Ngoc N.P., Truong T.H. Flexible http-based video adaptive streaming for good QoE during sudden bandwidth drops. EAI Endorsed Transactions on Industrial Networks and Intelligent Systems. 2023. vol. 10. no. 2. doi: 10.4108/eetinis.v10i2.2994.
Wong E.S., Wahab N.H.A., Saeed F., Alharbi N. 360-degree video bandwidth reduction: Technique and approaches comprehensive review. Applied Sciences. 2022. vol. 12. no. 15. doi: 10.3390/app12157581.
Lampropoulos G., Barkoukis V., Burden K., Anastasiadis T. 360-degree video in education: An overview and a comparative social media data analysis of the last decade. Smart Learning Environments. 2021. vol. 8. doi: 10.1186/s40561-021-00165-8.
Ng K.-T., Chan S.-C., Shum H.-Y. Data compression and transmission aspects of panoramic videos. IEEE Transactions on Circuits and Systems for Video Technology. 2005. vol. 15. no. 1. pp. 82–95. doi: 10.1109/TCSVT.2004.839989.
Xie L., Xu Z., Ban Y., Zhang X., Guo Z. 360ProbDASH: Improving QoE of 360 video streaming using tile-based http adaptive streaming. Proceedings of the 25th ACM international conference on Multimedia. 2017. pp. 315–323. doi: 10.1145/3123266.3123291.
Hosseini M., Swaminathan V. Adaptive 360 VR video streaming: Divide and conquer. IEEE International Symposium on Multimedia (ISM). IEEE, 2016. pp. 107–110.
El-Ganainy T., Hefeeda M. Streaming virtual reality content. arXiv preprint arXiv:1612.08350. 2016. doi: 10.48550/arXiv.1612.08350.
Xu M., Song Y., Wang J., Qiao M., Huo L., Wang Z. Predicting head movement in panoramic video: A deep reinforcement learning approach. IEEE transactions on pattern analysis and machine intelligence. 2019. vol. 41. no. 11. pp. 2693–2708. doi: 10.1109/TPAMI.2018.2858783.
Hu H.-N., Lin Y.-C., Liu M.-Y., Cheng H.-T., Chang Y.-J., Sun M. Deep 360 pilot: Learning a deep agent for piloting through 360deg sports videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2017. pp. 1396–1405.
Bao Y., Wu H., Zhang T., Ramli A.A., Liu X. Shooting a moving target: Motion-prediction-based transmission for 360-degree videos. IEEE International Conference on Big Data. IEEE. 2016. pp. 1161–1170. doi: 10.1109/BigData.2016.7840720.
Petrangeli S., Swaminathan V., Hosseini M., De Turck F. An http/2-based adaptive streaming framework for 360 virtual reality videos. Proceedings of the 25th ACM international conference on Multimedia. 2017. pp. 306–314. doi: 10.1145/3123266.3123453.
Hung N.V., Tien B.D., Anh T.T.T., Nam P.N., Huong T.T. An efficient approach to terminate 360-video stream on http/3. AIP Conference Proceedings. AIP Publishing. 2023. vol. 2909. no. 1.
Yu J., Liu Y. Field-of-view prediction in 360-degree videos with attention-based neural encoder-decoder networks. Proceedings of the 11th ACM Workshop on Immersive Mixed and Virtual Environment Systems. 2019. pp. 37–42. doi: 10.1145/3304113.3326118.
Park S., Bhattacharya A., Yang Z., Das S.R., Samaras D. Mosaic: Advancing user quality of experience in 360-degree video streaming with machine learning. IEEE Transactions on Network and Service Management. 2021. vol. 18. no. 1. pp. 1000–1015. doi: 10.1109/TNSM.2021.3053183.
Lee D., Choi M., Lee J. Prediction of head movement in 360-degree videos using attention model. Sensors. 2021. vol. 21. no. 11. doi: 10.3390/s21113678.
Chen X., Kasgari A.T.Z., Saad W. Deep learning for content-based personalized viewport prediction of 360-degree VR videos. IEEE Networking Letters. 2020. vol. 2. no. 2. pp. 81–84. doi: 10.1109/LNET.2020.2977124.
Vielhaben J., Camalan H., Samek W., Wenzel M. Viewport forecasting in 360 virtual reality videos with machine learning. IEEE international conference on artificial intelligence and virtual reality (AIVR). IEEE. 2019. pp. 74–747. doi: 10.1109/AIVR46125.2019.00020.
Uddin M.M., Park J. Machine learning model evaluation for 360°video caching. IEEE World AI IoT Congress (AIIoT). IEEE. 2022. pp. 238–244. doi: 10.1109/AIIoT54504.2022.9817292.
Fan C.-L., Yen S.-C., Huang C.-Y., Hsu C.-H. Optimizing fixation prediction using recurrent neural networks for 360° video streaming in head-mounted virtual reality. IEEE Transactions on Multimedia. 2020. vol. 22. no. 3. pp. 744–759. doi: 10.1109/TMM.2019.2931807.
Yaqoob A., Bi T., Muntean G.-M. A survey on adaptive 360 video streaming: Solutions, challenges and opportunities. IEEE Communications Surveys & Tutorials. 2020. vol. 22. no. 4. pp. 2801–2838. doi: 10.1109/COMST.2020.3006999.
Liu X. Deng Y. Learning-based prediction, rendering and association optimization for mec-enabled wireless virtual reality (VR) networks. IEEE Transactions on Wireless Communications. 2021. vol. 20. no. 10. pp. 6356–6370. doi: 10.1109/TWC.2021.3073623.
Gadaleta M., Chiariotti F., Rossi M., Zanella A. D-DASH: A deep q-learning framework for dash video streaming. IEEE Transactions on Cognitive Communications and Networking. 2017. vol. 3. no. 4. pp. 703–718. doi: 10.1109/TCCN.2017.2755007.
Souane N., Bourenane M., Douga Y. Deep reinforcement learning-based approach for video streaming: Dynamic adaptive video streaming over HTTP. Applied Sciences. 2023. vol. 13. no. 21. doi: 10.3390/app132111697.
Xie Y., Zhang Y., Lin T. Deep curriculum reinforcement learning for adaptive 360 ◦ video streaming with two-stage training. IEEE Transactions on Broadcasting. 2023. vol. 70. no. 2. pp. 441–452. doi: 10.1109/tbc.2023.3334137.
Du L., Zhuo L., Li J., Zhang J., Li X., Zhang H. Video quality of experience metric for dynamic adaptive streaming services using dash standard and deep spatial-temporal representation of video. Applied Sciences. 2020. vol. 10. no. 5. doi: 10.3390/app10051793.
Mao H., Chen S., Dimmery D., Singh S., Blaisdell D., Tian Y., Alizadeh M., Bakshy E. Real-world video adaptation with reinforcement learning. arXiv preprint arXiv:2008.12858. 2020. doi: 10.48550/arXiv.2008.12858.
David E.J., Gutiérrez J., Coutrot A., Da Silva M.P., Callet P.L. A dataset of head and eye movements for 360 videos. Proceedings of the 9th ACM Multimedia Systems Conference. 2018. pp. 432–437.
Wu C., Zhang R., Wang Z., Sun L. A spherical convolution approach for learning long term viewport prediction in 360 immersive video. Proceedings of the AAAI Conference on Artificial Intelligence. 2020. vol. 34. no. 01. pp. 14003–14040. doi: 10.1609/aaai.v34i01.7377.

Supplementary files

Supplementary Files

Action

1. JATS XML

Download

Username
Password
Remember me

Forgot password?	Register

Username
Password
Remember me

Forgot password?	Register

HEVERL – Viewport Estimation Using Reinforcement Learning for 360-degree Video Streaming

Full Text

Abstract

Keywords

About the authors

References

Supplementary files