Amr Amer amramer

Amr Amer

Machine Learning Engineer • Computer Vision • Multimodal Generative Models

Building intelligent systems for visual understanding, behavior generation, and real-time AI.

🚀 Featured Projects

🎓 Master’s Thesis: Personality-Aware Non-verbal Behavior Generation

Multimodal generative model for generating realistic listener avatars in dyadic conversations, conditioned on personality traits. The model predicts facial expressions, head motion, and upper-body gestures of a listener from the speaker’s audio and motion signals.

Tech Stack: PyTorch · Vision-Transformers · VQ-VAE · SMPL-X (PIXIE)-Librosa · OpenCV · CUDA · TensorBoard SLURM · Enroot · Multi-GPU Training (A100)
Demo: Thesis Website
Paper: Full Thesis Document

🏸 Badminton-VisionAI

End-to-end computer vision pipeline for badminton match analysis that converts badminton match videos into structured performance analytics for players and coaches. The system tracks both players and the shuttlecock, detects shot events, projects motion onto a mini-court representation, and generates a downloadable coach-style performance report.

Tech Stack: Python · PyTorch · OpenCV · YOLO · ByteTrack · Streamlit · SAM · Plotly · Docker · CUDA TrackNet · CI/CD Pipeline · Homography · Body Poses estimation · 3D Motion Trajectories
Demo: Live Demo | CV. Pipeline Video | Dashboard Video

🚗 Semantic Segmentation for Autonomous Vehicles

This project focuses on semantic segmentation using the BDD100K dataset, a large-scale, diverse dataset for autonomous driving. The main objective is to accurately segment and identify various objects in street scenes, which is important for improving the AI perception vision of autonomous vehicles.

Tech Stack: PyTorch · Fastai · Semantic Segmentation · Hyperparamter Tunning · Weights & Biases

🩺 3D Brain Tumor Segmentation (MRI)

An end-to-end workflow for multi-label brain tumor segmentation from 3D multimodal MRI scans. The project targets segmentation of glioma subregions (tumor core, whole tumor, enhancing tumor) using a 3D SegResNet model.

Tech Stack: PyTorch · MONAI · 3D SegResNet · Multi-modal MRI 3D Medical Image Transforms · Sliding Window Inference · Experiment Tracking (W&B)

🎥 Realtime Vision Captioning

This repository contains a curated set of Jupyter notebooks demonstrating core computer vision and vision–language capabilities using VLM foundation models. The notebooks progress from offline image understanding tasks to a realtime webcam application that performs image captioning and image classification on live video streams.

Tech Stack: PyTorch · Torchvision · Hugging Face Transformers · Gradio · PIL · VLM · Realtime inference

🎙️ AI Conversational Agent (Ongoing)

This AI Agent is more than just a chatbot. It is real-time voice/text interaction, LLM-powered reasoning, and a digital human conversational model.

Tech Stack: Python · OpenAI GPT · Speech-to-Text · Text-to-Speech · HTML · CSS · JavaScript · Bootstrap · jQuery

🌐 Portfolio

For more details about my work and projects, visit my portfolio website.

🧠 Technical Focus

Area	Skills & Tools
Computer Vision	Object Detection, Segmentation, Tracking, Video Motion Analysis, Digital Human Models
Models & Frameworks	PyTorch, TensorFlow, OpenCV, YOLO, Vision Transformers (ViT), Supervision
Training & Evaluation	Transfer Learning, Hyperparameter Optimization, Weights & Biases, TensorBoard
Deployment	Docker, AWS (EC2 GPU), Streamlit, Batch & Real-time Inference
Software Engineering	Python, C++, Object-Oriented Design, Git, Debugging, Unit Testing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly