Human Pose Estimation from Images

This project implements a full pose estimation pipeline using keypoints from the COCO dataset. It includes data preprocessing, heatmap generation, model training, and evaluation using the PCK@0.2 metric.

🧠 Project Goals

Predict 17 keypoints of a human body (COCO format)
Use heatmaps as supervision (Gaussian blobs)
Explore soft-argmax and coordinate losses
Evaluate using PCK@0.2

📂 Project Structure

pose-estimation-project/
├── README.md
├── dataset
    ├── annotations
    ├── coco_subset
    ├── coco_subset_large
    ├── generate_subset.py
    ├── train2017
    └── val2017
├── output
├── output_200_image
├── output_epoch30
├── src
    ├── checkpoints
    ├── pck.py
    ├── pose_model.pth
    ├── simple_pose_net.py
    ├── train.py
    ├── unet_pose_net.py
    └── unit_test_simple_pose_net.py
└── utils
    ├── debug_visualize_heatmaps.py
    ├── decode_keypoints.py
    ├── heatmap_generator.py
    ├── heatmaps_decoder.py
    ├── soft_argmax.py
    ├── test_heatmap_decoder.py
    ├── visualiser.py
    └── visualize_predictions.py

🧪 Dataset

Subset of COCO Keypoints 2017
Filtered for images with at least one visible person
Used subset sizes: 200, 2000
Input resolution: 256×192
Output resolution: 96×72 or 192×256

🔧 Training Setup

Encoder: Pretrained ResNet18
Decoder:
- SimplePoseNet: 3-layer upsampling
- UNetPoseNet: U-Net with skip connections
Loss:
- BCEWithLogits + Soft-argmax L1 loss
- Joint-wise weighted coordinate loss
Optimizer: Adam (LR=1e-3)
Metric: PCK@0.2

📈 Results

Best PCK@0.2 on 2000 images: ~0.51
Model learns rough vicinity of joints, but not fine arrangement
Qualitative examples show blobs forming, but not well structured

🧩 Key Observations

Soft-argmax decoder improved keypoint sharpness
U-Net decoder improved learning for difficult joints (ankles, wrists)
Ground-truth heatmap resolution must match model output resolution
Prediction keypoints sometimes correct in location but misordered

🚫 Known Issues

GT annotations in COCO have noisy or missing keypoints
Model sometimes predicts joints in the correct area but wrong order
Predicted skeleton structure not yet coherent
Multiple person ambiguity: Supervision is limited to the first visible person, causing the model to sometimes place keypoints on other individuals in multi-person images.

📚 Lessons Learned

Visual debugging is crucial
Output resolution affects heatmap precision
Masking invisible keypoints stabilizes training
Balancing coordinate vs heatmap loss helps

✍️ To Try Next

Structured refinement with graph-based loss
Hourglass or Transformer-style decoders
Bone length constraints
Pose refinement from initial prediction

👨‍💻 Author

Karan Anand, PhD
LinkedIn | GitHub

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Human Pose Estimation from Images

🧠 Project Goals

📂 Project Structure

🧪 Dataset

🔧 Training Setup

📈 Results

🧩 Key Observations

🚫 Known Issues

📚 Lessons Learned

✍️ To Try Next

👨‍💻 Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
dataset		dataset
output		output
output_200_image		output_200_image
output_epoch30		output_epoch30
src		src
utils		utils
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md

kanand-cfd/pose-estimation-project

Folders and files

Latest commit

History

Repository files navigation

Human Pose Estimation from Images

🧠 Project Goals

📂 Project Structure

🧪 Dataset

🔧 Training Setup

📈 Results

🧩 Key Observations

🚫 Known Issues

📚 Lessons Learned

✍️ To Try Next

👨‍💻 Author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages