This project implements a real-time object detection and distance estimation system with audio feedback using YOLOv4-tiny, OpenCV, and Flask. The system detects various objects in a video stream, estimates their distance from the camera, and provides audio feedback about the detected objects and their distances.
The main features of this project include:
- Real-time object detection using YOLOv4-tiny
- Distance estimation for detected objects
- Audio feedback for detected objects and their distances
- Web interface for viewing the video stream with detections
This system can be particularly useful for:
- Assisting visually impaired individuals in navigating their environment
- Enhancing situational awareness in various scenarios
- Educational purposes in computer vision and AI
- Python 3.7+
- Conda (for environment management)
- CUDA-capable GPU (for optimal performance)
-
Clone the repository:
git clone https://github.com/dhananjay6561/object-detection-distance-estimation.git cd StepSense -
Create a Conda environment:
conda create -n object_detection python=3.12 conda activate object_detection
-
Install required packages:
pip install opencv-python flask pyttsx3 numpy
-
Download YOLOv4-tiny weights:
wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights
-
Start the Flask application:
python app.py
-
Open a web browser and navigate to
http://localhost:5000to view the video stream with object detection and distance estimation.
app.py: Main application file containing object detection, distance estimation, and Flask server codeyolov4-tiny.cfg: YOLOv4-tiny configuration fileclasses.txt: File containing class names for object detectionReferenceImages/: Directory containing reference images for distance calibrationtemplates/: Directory containing HTML templates for the web interface
-
Object Detection: The system uses YOLOv4-tiny, a lightweight version of the YOLO (You Only Look Once) object detection algorithm, to detect objects in real-time from the video stream.
-
Distance Estimation: Distance is estimated using a focal length calculation based on known object sizes and their sizes in reference images. This allows the system to approximate the distance of detected objects from the camera.
-
Audio Feedback: The system uses pyttsx3 to provide audio feedback about detected objects and their estimated distances. The volume of the audio feedback is adjusted based on the estimated distance of the objects.
-
Web Interface: A Flask web server is used to stream the processed video with object detection bounding boxes and distance information to a web browser.
Contributions to improve the project are welcome! Here's how you can contribute:
- Fork the repository
- Create your feature branch:
git checkout -b feature/AmazingFeature
- Commit your changes:
git commit -m 'Add some AmazingFeature' - Push to the branch:
git push origin feature/AmazingFeature
- Open a Pull Request
Areas for potential improvement:
- Enhancing distance estimation accuracy
- Adding support for more object types
- Improving the web interface
- Optimizing performance for lower-end devices