Skip to content
View dbiswas55's full-sized avatar

Block or report dbiswas55

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
dbiswas55/README.md

Hi, this is Dipayan Biswas 👋

I’m a PhD researcher in Computer Science at the University of Houston, working on computer vision, multimodal AI, and video understanding.

Lately, I’ve been exploring how multimodal systems can make better sense of long videos, connect language with visuals, and generate more useful summaries and chapter-level understanding.

This GitHub is where I share projects and tools across multimodal summarization, vision-language systems, object detection, and practical ML workflows.

If any of this sounds interesting, feel free to reach out: dipayan1109033@gmail.com


🌐 Website · LinkedIn · Email


Research Interests

Multimodal AI · Computer Vision · Video Understanding · Vision-Language Models · Multimodal Summarization · Visual Grounding

Skills

Python · SQL · PyTorch · Lightning · Transformers · OpenCV · Hydra · MLflow · DeepEval · Git · AWS

GitHub Stuff

Pinned Loading

  1. VLM-Inferences VLM-Inferences Public

    A lightweight, config-driven framework for unified vision-language model inference across local and cloud backends.

    Python 1

  2. SOTA-Detection-Lab SOTA-Detection-Lab Public

    SOTA-Detection-Lab is a PyTorch-based framework for training, fine-tuning, and evaluating state-of-the-art object detection models on custom datasets. It supports Faster R-CNN, YOLO, CenterNet, Eff…

    Python 6

  3. edu-video-visual-detection edu-video-visual-detection Public

    Code and resources for the paper "Visual Content Detection in Educational Videos with Transfer Learning and Dataset Enrichment" (IEEE MIPR 2025).

    Python 2

  4. LearnMLOpsTools LearnMLOpsTools Public

    Hands-on MLOps tooling study: WandB vs MLflow side-by-side using PyTorch + FashionMNIST. Covers experiment tracking, metric logging, model artifacts, and visual analysis. More tools added as studied.

    Python 1

  5. Pytorch_Retinaface-v.2 Pytorch_Retinaface-v.2 Public

    Forked from biubug6/Pytorch_Retinaface

    Improved RetinaFace for long-distance face detection using an IoU-aware loss and auxiliary IoU head.

    Python

  6. LVVO_dataset LVVO_dataset Public

    A dataset of lecture video frames annotated with visual elements such as tables, charts, photographs, and illustrations, designed for visual content detection and educational video analysis.

    1