This project is a PyTorch implementation of a relational attention-based model for Group Activity Recognition, inspired by the research paper:
"A Hierarchical Deep Temporal Model for Group Activity Recognition" (arXiv:1511.06040v2)
The model combines person-level feature extraction (ResNet18) with graph relational attention and temporal modeling (LSTM) to recognize both individual and group-level actions in video sequences.
- 🧩 End-to-end attention-based temporal model (
RCRG_R2_C11_conc_temporal) - 🔗 Graph Relational Attention (
RelationalUnit) - 🧠 Pretrained ResNet18 for person-level features
- 🔁 Temporal sequence modeling with LSTM
- ⚙️ Configurable via YAML files
- 🧪 Supports train/validation/test splits
- 🧮 Includes sampler balancing and TTA (Test-Time Augmentation)
- 💾 Automatic checkpoint saving
group_activity_recognition/
│
├── configs/
│ └── attention_models/
│ └── RCRG_R1_C1_conc_temporal_end2end.yml # Model & training config
│
├── models/
│ └── attention_models/
│ ├── relational_attention.py # Graph attention block
│ └── RCRG_R2_C11_conc_temporal.py # Group Activity model
│
├── training/
│ ├── trainer.py # Trainer logic
│ └── train_attention_model.py # Main training script
│
├── utils/
│ └── data_utils/
│ └── sampler_weights.py # Sampler weight computation
│ # Training outputs (checkpoints, logs)
│
├── requirements.txt
└── README.md
Extracts per-person features using a pretrained ResNet18 followed by fully connected layers.
Implements a multi-head attention mechanism across detected persons within the same frame to model inter-person interactions.
Uses LSTM layers to process temporal dependencies across frames and outputs group-level activity predictions.
git clone https://github.com/youcefgheffari3/group_activity_recognition.git
cd group_activity_recognitionpython -m venv venv
venv\Scripts\activate # (Windows)
# source venv/bin/activate # (Linux/Mac)pip install -r requirements.txtIf you have CUDA:
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu118configs/attention_models/RCRG_R1_C1_conc_temporal_end2end.yml
to match your dataset paths.
Then run:
python training/train_attention_model.pyTraining checkpoints and logs will be saved automatically to:
experiments/attention_models/
To evaluate the trained model (or perform TTA):
python models/attention_models/RCRG_R2_C11_conc_temporal.pyDuring training, you'll see logs like:
Epoch [1/50] | Train Loss: 0.893 | Val Acc: 82.5% | F1: 0.83
Checkpoint saved: experiments/attention_models/epoch_01.pkl
TensorBoard logs are also available:
tensorboard --logdir experiments/attention_models/Main dependencies (from requirements.txt):
torch
torchvision
torch_geometric
albumentations
scikit-learn
opencv-python
tensorboard
pandas
numpy
matplotlib
pyyaml
Gheffari Youcef Soufiane
Master's Student in Artificial Intelligence
University of Science and Technology of Oran Mohamed-Boudiaf (USTOMB)
📧 gheffari.youcef.soufiane@gmail.com
If you use this implementation, please cite the original paper:
@article{ibrahim2016hierarchical,
title={A Hierarchical Deep Temporal Model for Group Activity Recognition},
author={Ibrahim, Mostafa S and Muralidharan, Shuo and Deng, Zhiheng and Vahdat, Arash and Mori, Greg},
journal={arXiv preprint arXiv:1511.06040},
year={2016}
}This project is released for academic and research purposes under the MIT License.
This implementation was developed as a personal initiative to deepen my knowledge and strengthen my skills in computer vision. The project demonstrates:
- Deep understanding of attention mechanisms and relational reasoning
- Practical experience with PyTorch and computer vision architectures
- Ability to reproduce and extend state-of-the-art research
- End-to-end implementation from data processing to model evaluation
For internship or collaboration opportunities, feel free to reach out!