A real-time Augmented Reality (AR) camera tracking and interpretation application that processes live webcam input to detect hand keypoints and interpret gestures using machine learning models.
This project implements a complete AR pipeline including keypoint extraction, gesture classification, motion history analysis, and real-time visualization.
The application captures live video from a webcam, detects hand landmarks, extracts keypoint features, and classifies both static and dynamic gestures. The interpreted output is displayed directly on the camera feed, enabling real-time human–computer interaction.
The system is modular and designed for experimentation, making it suitable for learning and extending AR and computer vision concepts.
- Real-time webcam feed processing
- Hand and keypoint detection using computer vision
- Static gesture classification using keypoint features
- Dynamic gesture interpretation using point-history tracking
- Real-time visualization using OpenCV
- Pre-trained TensorFlow / TFLite models for inference
- Jupyter notebooks for model training and experimentation
- Python
- OpenCV
- MediaPipe
- NumPy
- TensorFlow / Keras
- Jupyter Notebook
AR-Camera-Tracking-and-Interpreter-App/
│
├── app.py
│
├── utils/
│ ├── cvfpscalc.py
│ └── init.py
│
├── model/
│ ├── keypoint_classifier/
│ │ ├── keypoint_classifier.py
│ │ ├── keypoint_classifier.tflite
│ │ ├── keypoint_classifier.hdf5
│ │ ├── keypoint_classifier_labels.csv
│ │
│ ├── point_history_classifier/
│ │ ├── point_history_classifier.py
│ │ ├── point_history_classifier.tflite
│ │ ├── point_history_classifier.hdf5
│ │ ├── point_history_classifier_labels.csv
│
├── keypoint_classification.ipynb
├── point_history_classification.ipynb
└── README.md
-
Camera Capture
Captures frames from a live webcam feed using OpenCV. -
Keypoint Detection
Detects hand landmarks from each frame using MediaPipe. -
Feature Processing
- Normalizes keypoint coordinates
- Tracks point history across frames for motion analysis
-
Gesture Classification
- Static gestures are classified using the keypoint classifier
- Dynamic gestures are interpreted using the point-history classifier
-
Visualization
The classification results are rendered directly on the live video feed.
- Python 3.7 or higher
- Webcam access
Install required dependencies:
pip install opencv-python mediapipe numpy tensorflow
Run the Application
python app.py
A window will open showing the live camera feed with real-time gesture tracking and interpretation.
The project includes Jupyter notebooks for training and testing the classifiers:
jupyter notebook keypoint_classification.ipynb
jupyter notebook point_history_classification.ipynb
These notebooks cover:
-Data preprocessing
-Feature extraction
-Model training
-Model evaluation
-Exporting trained models
-Gesture-controlled user interfaces
-Augmented Reality interaction systems
-Human–Computer Interaction research
-Educational demonstrations of computer vision
-Real-time motion interpretation applications
-Designed for real-time performance on standard consumer hardware
-Modular architecture allows easy addition of new gestures and models
-Trained models can be replaced without modifying core application logic
This project is open-source.
This project uses MediaPipe for hand tracking, OpenCV for computer vision operations, and TensorFlow for model training and inference.