Skip to content

kanand-cfd/pose-estimation-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Human Pose Estimation from Images

This project implements a full pose estimation pipeline using keypoints from the COCO dataset. It includes data preprocessing, heatmap generation, model training, and evaluation using the PCK@0.2 metric.

🧠 Project Goals

  • Predict 17 keypoints of a human body (COCO format)
  • Use heatmaps as supervision (Gaussian blobs)
  • Explore soft-argmax and coordinate losses
  • Evaluate using PCK@0.2

📂 Project Structure

pose-estimation-project/
├── README.md
├── dataset
    ├── annotations
    ├── coco_subset
    ├── coco_subset_large
    ├── generate_subset.py
    ├── train2017
    └── val2017
├── output
├── output_200_image
├── output_epoch30
├── src
    ├── checkpoints
    ├── pck.py
    ├── pose_model.pth
    ├── simple_pose_net.py
    ├── train.py
    ├── unet_pose_net.py
    └── unit_test_simple_pose_net.py
└── utils
    ├── debug_visualize_heatmaps.py
    ├── decode_keypoints.py
    ├── heatmap_generator.py
    ├── heatmaps_decoder.py
    ├── soft_argmax.py
    ├── test_heatmap_decoder.py
    ├── visualiser.py
    └── visualize_predictions.py

🧪 Dataset

  • Subset of COCO Keypoints 2017
  • Filtered for images with at least one visible person
  • Used subset sizes: 200, 2000
  • Input resolution: 256×192
  • Output resolution: 96×72 or 192×256

🔧 Training Setup

  • Encoder: Pretrained ResNet18
  • Decoder:
    • SimplePoseNet: 3-layer upsampling
    • UNetPoseNet: U-Net with skip connections
  • Loss:
    • BCEWithLogits + Soft-argmax L1 loss
    • Joint-wise weighted coordinate loss
  • Optimizer: Adam (LR=1e-3)
  • Metric: PCK@0.2

📈 Results

  • Best PCK@0.2 on 2000 images: ~0.51
  • Model learns rough vicinity of joints, but not fine arrangement
  • Qualitative examples show blobs forming, but not well structured

Predicted vs Ground Truth Keypoints: SimplePoseNet with COCO dataset of 200 images after 10 epochs SimplePoseNet with COCO dataset of 2000 images after 30 epochs UNet based model with COCO dataset of 2000 images after 10 epochs

🧩 Key Observations

  • Soft-argmax decoder improved keypoint sharpness
  • U-Net decoder improved learning for difficult joints (ankles, wrists)
  • Ground-truth heatmap resolution must match model output resolution
  • Prediction keypoints sometimes correct in location but misordered

🚫 Known Issues

  • GT annotations in COCO have noisy or missing keypoints
  • Model sometimes predicts joints in the correct area but wrong order
  • Predicted skeleton structure not yet coherent
  • Multiple person ambiguity: Supervision is limited to the first visible person, causing the model to sometimes place keypoints on other individuals in multi-person images.

📚 Lessons Learned

  • Visual debugging is crucial
  • Output resolution affects heatmap precision
  • Masking invisible keypoints stabilizes training
  • Balancing coordinate vs heatmap loss helps

✍️ To Try Next

  • Structured refinement with graph-based loss
  • Hourglass or Transformer-style decoders
  • Bone length constraints
  • Pose refinement from initial prediction

👨‍💻 Author

Karan Anand, PhD
LinkedIn | GitHub

About

Human Pose Estimation from Images using Convolutional Neural Networks

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages