Skip to content
View man4ish's full-sized avatar

Block or report man4ish

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
man4ish/README.md

🧬 Manish Kumar — Senior AI & Bioinformatics Engineer

AI-Orchestrated Pipelines | Data Science | Cloud & LLM Integration | Open-Source Innovator

Welcome to my open-source lab — where bioinformatics meets AI, cloud engineering, and automation. Over the past few months, I’ve been developing modular systems that unify LangGraph, Nextflow, RAG pipelines, LLMs, and machine learning to make scientific workflows more intelligent, adaptive, and reproducible.


🚀 Featured Projects

Project Description Tech Stack
LangGraph BioFlow AI-orchestrated bioinformatics workflow supervisor integrating LangGraph + Nextflow for adaptive reruns, QC-driven decisions, and workflow intelligence. Python, Nextflow, LangGraph, LLM
Omni Bio Lab Modular Django-based bioinformatics web app for genomic visualization, ML prediction, and variant annotation. Django, Nextflow, ML, Docker
scAtlas Builder Single-Cell Atlas Builder for analyzing, integrating, and visualizing scRNA-seq datasets with optional LLM-powered summaries. FastAPI, Scanpy, CellTypist, DeepSeek-R1
LIMS-X Lightweight Django LIMS for tracking biological samples, metadata, and research projects. Django, Docker, Python, SqlAlchemy, RestAPI, MySQL
RAG Gene Discovery Assistant Retrieval-Augmented Generation pipeline for gene–disease literature summarization using embeddings and LLMs. RAG, LangChain, Hugging Face, Llama3
bacformer-comparative-functional-pipeline Pipeline to extract protein sequences, generate Bacformer embeddings, and perform comparative analysis using cosine similarity. Transformers, ML, Bioinformatics
DeepVariant Fine-Tuning Demonstrates DeepVariant variant calling with fine-tuning on small genomic datasets and performance comparison with pretrained models. TensorFlow, Genomics, Docker, ML
Bioinfo LoRA Fine-Tuning Fine-tunes TinyLlama-1.1B for bioinformatics instruction–response tasks using LoRA. Includes comparison of pre- and post-finetuning model outputs. LoRA, LLM, NLP, Transformers
Llama3 Protein Pathway Summarizer Demonstrates how Llama 3 synthesizes protein and pathway knowledge for bioinformatics research. Llama3, Python, NLP
Workflow Doc Gen (Llama3) LLM-powered documentation generator for bioinformatics workflows (Nextflow/WDL). Llama3, WDL, Nextflow
Llama3 Variant Interpretation Automated variant interpretation pipeline using VEP, ANNOVAR, and Llama3. ANNOVAR, VEP, Llama3, Ollama
AWS Portfolio Projects End-to-end cloud automation and ML deployment using AWS SageMaker, Glue, Lambda, and Batch. AWS, SageMaker, Glue, Lambda
AI Dev Docker Preconfigured GPU-ready AI development environment with CUDA, Hugging Face, Ollama, R, and JupyterLab. Docker, CUDA, PyTorch, Ollama
PipelineWorks Collection of data engineering projects showcasing scalable ETL pipelines, Airflow orchestration, and Azure workflows. Airflow, ADF, Databricks, Python
Applied AI & Data Science Lab A curated collection of applied AI, machine learning, and data science projects — integrating predictive modeling, NLP, bioinformatics, and data engineering workflows. Python, SQL, Deep Learning, ML Algorithms, AI

🧠 Areas of Focus

  • AI + Bioinformatics Integration — LLMs for genomic data interpretation, annotation, and summarization
  • Data Engineering — ETL pipelines, orchestration with Nextflow and Airflow
  • Cloud Computing — AWS SageMaker, Glue, Lambda, Batch, Azure Data Factory
  • Machine Learning — Deep learning, NLP, RAG, embeddings, LoRA fine-tuning
  • Web & Systems Development — Django, FastAPI, REST APIs, LIMS design
  • Reproducibility & Infrastructure — Docker, CUDA, GitHub Actions, CI/CD

📚 Publications & Research

  • 2 publications (co-author) + 2 in progress
  • 86 citations in genomics and computational biology

🛠️ Technical Skills

Languages: Python, SQL, R, C++, JavaScript

Frameworks & Tools: PyTorch, TensorFlow, LangChain, LangGraph, Nextflow, FastAPI, Django

Cloud & DevOps: AWS, Azure, Docker, GitHub Actions, CI/CD

Databases: MySQL, MongoDB, PostgreSQL


🌐 Connect


💡 “Building AI-driven scientific systems that understand data — not just process it.”

Popular repositories Loading

  1. MotifUtils MotifUtils Public

    Forked from kbasecollaborations/MotifUtils

    Python 1

  2. MotifFinderMEME MotifFinderMEME Public

    Forked from kbasecollaborations/MotifFinderMEME

    Python 1

  3. MotifFinderHomer MotifFinderHomer Public

    Forked from kbasecollaborations/MotifFinderHomer

    Python 1

  4. MotifFinderGibbs MotifFinderGibbs Public

    Forked from kbasecollaborations/MotifFinderGibbs

    Python 1

  5. MotifEnsemble MotifEnsemble Public

    Forked from kbasecollaborations/MotifEnsemble

    Python 1

  6. SequenceSetUtils SequenceSetUtils Public

    Forked from kbasecollaborations/SequenceSetUtils

    Python 1