Skip to content

PyTorch implementation of a Relational Attention-Based Model for Group Activity Recognition — inspired by "A Hierarchical Deep Temporal Model for Group Activity Recognition (arXiv:1511.06040v2)"

Notifications You must be signed in to change notification settings

youcefgheffari3/group_activity_recognition

Repository files navigation

🧠 Relational Group Activity Recognition

This project is a PyTorch implementation of a relational attention-based model for Group Activity Recognition, inspired by the research paper:

"A Hierarchical Deep Temporal Model for Group Activity Recognition" (arXiv:1511.06040v2)

The model combines person-level feature extraction (ResNet18) with graph relational attention and temporal modeling (LSTM) to recognize both individual and group-level actions in video sequences.


🚀 Features

  • 🧩 End-to-end attention-based temporal model (RCRG_R2_C11_conc_temporal)
  • 🔗 Graph Relational Attention (RelationalUnit)
  • 🧠 Pretrained ResNet18 for person-level features
  • 🔁 Temporal sequence modeling with LSTM
  • ⚙️ Configurable via YAML files
  • 🧪 Supports train/validation/test splits
  • 🧮 Includes sampler balancing and TTA (Test-Time Augmentation)
  • 💾 Automatic checkpoint saving

📂 Project Structure

group_activity_recognition/
│
├── configs/
│   └── attention_models/
│       └── RCRG_R1_C1_conc_temporal_end2end.yml     # Model & training config
│
├── models/
│   └── attention_models/
│       ├── relational_attention.py                  # Graph attention block
│       └── RCRG_R2_C11_conc_temporal.py             # Group Activity model
│
├── training/
│   ├── trainer.py                                   # Trainer logic
│   └── train_attention_model.py                     # Main training script
│
├── utils/
│   └── data_utils/
│       └── sampler_weights.py                       # Sampler weight computation
│                               # Training outputs (checkpoints, logs)
│
├── requirements.txt
└── README.md

🧩 Model Overview

🎯 Person Activity Classifier

Extracts per-person features using a pretrained ResNet18 followed by fully connected layers.

🔗 Relational Attention Module

Implements a multi-head attention mechanism across detected persons within the same frame to model inter-person interactions.

⏳ Group Activity Classifier

Uses LSTM layers to process temporal dependencies across frames and outputs group-level activity predictions.


⚙️ Installation

1. Clone the repository

git clone https://github.com/youcefgheffari3/group_activity_recognition.git
cd group_activity_recognition

2. Create a virtual environment

python -m venv venv
venv\Scripts\activate  # (Windows)
# source venv/bin/activate  # (Linux/Mac)

3. Install dependencies

pip install -r requirements.txt

If you have CUDA:

pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118

🧠 Training

Edit the configuration file

configs/attention_models/RCRG_R1_C1_conc_temporal_end2end.yml to match your dataset paths.

Then run:

python training/train_attention_model.py

Training checkpoints and logs will be saved automatically to:

experiments/attention_models/

📊 Evaluation

To evaluate the trained model (or perform TTA):

python models/attention_models/RCRG_R2_C11_conc_temporal.py

📈 Example Output

During training, you'll see logs like:

Epoch [1/50] | Train Loss: 0.893 | Val Acc: 82.5% | F1: 0.83
Checkpoint saved: experiments/attention_models/epoch_01.pkl

TensorBoard logs are also available:

tensorboard --logdir experiments/attention_models/

🧩 Requirements

Main dependencies (from requirements.txt):

torch
torchvision
torch_geometric
albumentations
scikit-learn
opencv-python
tensorboard
pandas
numpy
matplotlib
pyyaml

🧑‍💻 Author

Gheffari Youcef Soufiane
Master's Student in Artificial Intelligence
University of Science and Technology of Oran Mohamed-Boudiaf (USTOMB)
📧 gheffari.youcef.soufiane@gmail.com


📚 Reference

If you use this implementation, please cite the original paper:

@article{ibrahim2016hierarchical,
  title={A Hierarchical Deep Temporal Model for Group Activity Recognition},
  author={Ibrahim, Mostafa S and Muralidharan, Shuo and Deng, Zhiheng and Vahdat, Arash and Mori, Greg},
  journal={arXiv preprint arXiv:1511.06040},
  year={2016}
}

📝 License

This project is released for academic and research purposes under the MIT License.


🎓 Academic Context

This implementation was developed as a personal initiative to deepen my knowledge and strengthen my skills in computer vision. The project demonstrates:

  • Deep understanding of attention mechanisms and relational reasoning
  • Practical experience with PyTorch and computer vision architectures
  • Ability to reproduce and extend state-of-the-art research
  • End-to-end implementation from data processing to model evaluation

For internship or collaboration opportunities, feel free to reach out!

About

PyTorch implementation of a Relational Attention-Based Model for Group Activity Recognition — inspired by "A Hierarchical Deep Temporal Model for Group Activity Recognition (arXiv:1511.06040v2)"

Topics

Resources

Stars

Watchers

Forks

Languages