In this article, we will be discussing about pose estimation and its applications.
When the machine uses computer vision to detect the shape and the structure of the human body, it is known as pose estimation.
Table of contents:
- Introduction to Pose Estimation
- How are pose estimation models/networks trained?
- What are the different approaches used for human pose estimation?
- What is 3D pose estimation?
- Applications of pose estimation
Introduction to Pose Estimation
Pose estimation is used in fields like gaming, medical healthcare , Motion capture, sports, etc . Pose estimation in simple words is the localization of joints of human body.
There are three types of pose estimation models:
- Skeleton based pose estimation
- Contour based pose estimation
- Volume based pose estimation
How are pose estimation models/networks trained ?
The model/network is provided with images as inputs. These images are labelled, and the labelling specifies the action/pose by a human and at the same time it has the positioning of different joints being marked.The most common dataset used worldwide is the MPII Human Pose dataset, This dataset consists of over 25k images with over 410 poses.
A typical pose estimation model makes use of 32 joints as the keypoints. These joints are usually the left nose , left eye , right eye, hip joint,right hip joint, left shoulder,right shoulder and so on. The main challenge in detection of human pose is the existence and possibility of different poses that a human body can make , also these poses range from simple to complex.The fact that humans wear clothes and that the input data taken will be in under different lighting conditions also turns out to be challenging.
What are the different approaches used for human pose estimation?
There are different approaches used for Human Pose Estimation such as PoseNet, DeepPose , OpenPose and so on. Based on the application of the pose estimation the networks perform efficiently.
- OpenPose - Open pose first detect the parts of human body and it is capable to detect multi person human poses.The network has 7X7 convolution layers along with pooling operators. In the first step , features are extracted from the image whereas in the second stage part confidence maps and part affinity fields are generated.Confidence maps give the probabilty density function of the new image with a confidence score varying between 0 to 1.Part affinity gives localization and orientation of body parts in the form of a 2D vector.The confidence maps and the PAF's are used to form bipartite graphs.Finally both these are combined to give the output.
PoseNet - Posenet is famously used in features like gesture control , which is one of the applications of pose estimation. There are 22 layers of convolutional network along with six ‘inception modules’ and two additional intermediate classifiers.There are no softmax layers but there are affine regressors.The fully connected layers output pose vectors of 7 dimensions.
DeepPose - DeepPose makes use of deep neural network to predict the pose estimation and it was the first model to do so. In this network there are 7 convolution layers, pooling layers and fully connected layer, in which only convolutional and fully connected layers have learnable parameters. They both contain linear transformation followed by a non-linear rectified linear unit (ReLU).
What is 3D pose estimation?
3D pose estimation is the localization of human joints in 3D space .Datasets like Densepose, SURREAL , UP-3D are used as training images to train models/ networks for 3D Pose estimation. Networks like OpenPose-3D help in representing real time detection of human body in 3D form. The openpose is also used for 3D pose estimation from 2D images.3D pose estimation is quie useful when we want to enable human - machine interaction.
Applications of pose estimation
Gaming and animations: The human pose estimation techniques helps in the making of games that make use of virtual reality , as it helps give users a more realistic experience. In addition to this some games make use pose estimators to make the movements of characters of the game more realistic.Also CGI that are currently being made use of can be replaced by 3D Pose estimation as it would cost lesser.
Robotics: It enables robots to see and make decisions based on the human poses and actions. Gestural control is one of the applications many of us are using in today's world.Also normal pose estimation is one of the features that today's world is immensely making use of in automation fields.
Sports: A lot of sports with facilities make use of this system to train their players so that they can correct their form and get better results and at the same time reduce injuries.Sports teams spend millions of dollars to make use of such technologies.
This article at OpenGenus was a brief explanation of human pose estimation.
To sumarize,Human pose estimation is a way of detection of humans given that the machine is taking inputs in the form of images/videos, helping open a way for human-machine interaction.Researchers are still trying to utilize datasets like COCO,MPII,ETH SfM in a more efficient way.