Stacked Hourglass Network with Additional Skip Connection for Human Pose Estimation
DOI:
https://doi.org/10.18100/ijamec.803330Keywords:
human pose estimation, hourglass network, deep learning, computer visionAbstract
The human pose estimation is a problem of localizing human joints in a single image, and that is still a challenge in the field of computer vision. The hourglass network has been used in many researches to achieve good performance in human pose estimation problems. For human pose estimation problem, not only high-level features but also low-level features are important for understanding the whole human body. However, the vanilla hourglass network has the problem of passing only high-level features to the next stack. Therefore, we propose a network structure that can solve the problems of the vanilla hourglass by using an additional skip connection. The proposed skip connection improves network performance by passing relative low-level features to the next stack. In addition, the skip connection is a simple element-wise Sum operation, so there is no increase in the number of parameters. In this work, we use the well-known human pose estimation data set, MPII, to evaluate the proposed method. We conducted experiments to evaluate the objective performance of the proposed method, and it was confirmed through this evaluation that the proposed method improves the performance of human pose estimation of the vanilla hourglass network.Downloads
References
Newell, Alejandro, Kaiyu Yang, and Jia Deng. "Stacked hourglass networks for human pose estimation." European conference on computer vision. Springer, Cham, 2016.
Bulat, Adrian, and Georgios Tzimiropoulos. "Human pose estimation via convolutional part heatmap regression." European Conference on Computer Vision. Springer, Cham, 2016.
Tompson, Jonathan J., et al. "Joint training of a convolutional network and a graphical model for human pose estimation." Advances in neural information processing systems. 2014.
Wang, Rui, et al. "Human pose estimation with deeply learned multi-scale compositional models." IEEE Access 7 (2019): 71158-71166.
Chu, Xiao, et al. "Multi-context attention for human pose estimation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2017.
Peng, Xi, et al. "Jointly optimize data augmentation and network training: Adversarial data augmentation in human pose estimation." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2018.
Bulat, Adrian, and Yorgos Tzimiropoulos. "Hierarchical binary CNNs for landmark localization with limited resources." IEEE transactions on pattern analysis and machine intelligence (2020).
Yang, Wei, et al. "Learning feature pyramids for human pose estimation." proceedings of the IEEE international conference on computer vision. 2017.
Tang, Wei, and Ying Wu. "Does Learning Specific Features for Related Parts Help Human Pose Estimation?." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2019.
He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Andriluka, Mykhaylo, et al. "2d human pose estimation: New benchmark and state of the art analysis." Proceedings of the IEEE Conference on computer Vision and Pattern Recognition. 2014.
Xiao, Bin, Haiping Wu, and Yichen Wei. "Simple baselines for human pose estimation and tracking." Proceedings of the European conference on computer vision (ECCV). 2018.
Paszke, Adam, et al. "Automatic differentiation in pytorch." (2017).
Kingma, Diederik P., and Jimmy Ba. "Adam: A method for stochastic optimization." arXiv preprint arXiv:1412.6980 (2014).
Pishchulin, Leonid, et al. "Strong appearance and expressive spatial models for human pose estimation." Proceedings of the IEEE international conference on Computer Vision. 2013.
Carreira, Joao, et al. "Human pose estimation with iterative error feedback." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Tompson, Jonathan, et al. "Efficient object localization using convolutional networks." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
Hu, Peiyun, and Deva Ramanan. "Bottom-up and top-down reasoning with hierarchical rectified gaussians." Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2016.
Pishchulin, Leonid, et al. "Deepcut: Joint subset partition and labeling for multi person pose estimation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Lifshitz, Ita, Ethan Fetaya, and Shimon Ullman. "Human pose estimation using deep consensus voting." European Conference on Computer Vision. Springer, Cham, 2016.
Gkioxari, Georgia, Alexander Toshev, and Navdeep Jaitly. "Chained predictions using convolutional neural networks." European Conference on Computer Vision. Springer, Cham, 2016.
Rafi, Umer, et al. "An Efficient Convolutional Network for Human Pose Estimation." BMVC. Vol. 1. 2016.
Belagiannis, Vasileios, and Andrew Zisserman. "Recurrent human pose estimation." 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017). IEEE, 2017.
Insafutdinov, Eldar, et al. "Deepercut: A deeper, stronger, and faster multi-person pose estimation model." European Conference on Computer Vision. Springer, Cham, 2016.
Wei, Shih-En, et al. "Convolutional pose machines." Proceedings of the IEEE conference on Computer Vision and Pattern Recognition. 2016.
Sun, Ke, et al. "Human pose estimation using global and local normalization." Proceedings of the IEEE International Conference on Computer Vision. 2017.
Downloads
Published
Issue
Section
License
Copyright (c) 2021 International Journal of Applied Methods in Electronics and Computers
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.