AI-Orchestrated Pipelines | Data Science | Cloud & LLM Integration | Open-Source Innovator
Welcome to my open-source lab — where bioinformatics meets AI, cloud engineering, and automation. Over the past few months, I’ve been developing modular systems that unify LangGraph, Nextflow, RAG pipelines, LLMs, and machine learning to make scientific workflows more intelligent, adaptive, and reproducible.
| Project | Description | Tech Stack |
|---|---|---|
| LangGraph BioFlow | AI-orchestrated bioinformatics workflow supervisor integrating LangGraph + Nextflow for adaptive reruns, QC-driven decisions, and workflow intelligence. | Python, Nextflow, LangGraph, LLM |
| Omni Bio Lab | Modular Django-based bioinformatics web app for genomic visualization, ML prediction, and variant annotation. | Django, Nextflow, ML, Docker |
| scAtlas Builder | Single-Cell Atlas Builder for analyzing, integrating, and visualizing scRNA-seq datasets with optional LLM-powered summaries. | FastAPI, Scanpy, CellTypist, DeepSeek-R1 |
| LIMS-X | Lightweight Django LIMS for tracking biological samples, metadata, and research projects. | Django, Docker, Python, SqlAlchemy, RestAPI, MySQL |
| RAG Gene Discovery Assistant | Retrieval-Augmented Generation pipeline for gene–disease literature summarization using embeddings and LLMs. | RAG, LangChain, Hugging Face, Llama3 |
| bacformer-comparative-functional-pipeline | Pipeline to extract protein sequences, generate Bacformer embeddings, and perform comparative analysis using cosine similarity. | Transformers, ML, Bioinformatics |
| DeepVariant Fine-Tuning | Demonstrates DeepVariant variant calling with fine-tuning on small genomic datasets and performance comparison with pretrained models. | TensorFlow, Genomics, Docker, ML |
| Bioinfo LoRA Fine-Tuning | Fine-tunes TinyLlama-1.1B for bioinformatics instruction–response tasks using LoRA. Includes comparison of pre- and post-finetuning model outputs. | LoRA, LLM, NLP, Transformers |
| Llama3 Protein Pathway Summarizer | Demonstrates how Llama 3 synthesizes protein and pathway knowledge for bioinformatics research. | Llama3, Python, NLP |
| Workflow Doc Gen (Llama3) | LLM-powered documentation generator for bioinformatics workflows (Nextflow/WDL). | Llama3, WDL, Nextflow |
| Llama3 Variant Interpretation | Automated variant interpretation pipeline using VEP, ANNOVAR, and Llama3. | ANNOVAR, VEP, Llama3, Ollama |
| AWS Portfolio Projects | End-to-end cloud automation and ML deployment using AWS SageMaker, Glue, Lambda, and Batch. | AWS, SageMaker, Glue, Lambda |
| AI Dev Docker | Preconfigured GPU-ready AI development environment with CUDA, Hugging Face, Ollama, R, and JupyterLab. | Docker, CUDA, PyTorch, Ollama |
| PipelineWorks | Collection of data engineering projects showcasing scalable ETL pipelines, Airflow orchestration, and Azure workflows. | Airflow, ADF, Databricks, Python |
| Applied AI & Data Science Lab | A curated collection of applied AI, machine learning, and data science projects — integrating predictive modeling, NLP, bioinformatics, and data engineering workflows. | Python, SQL, Deep Learning, ML Algorithms, AI |
- AI + Bioinformatics Integration — LLMs for genomic data interpretation, annotation, and summarization
- Data Engineering — ETL pipelines, orchestration with Nextflow and Airflow
- Cloud Computing — AWS SageMaker, Glue, Lambda, Batch, Azure Data Factory
- Machine Learning — Deep learning, NLP, RAG, embeddings, LoRA fine-tuning
- Web & Systems Development — Django, FastAPI, REST APIs, LIMS design
- Reproducibility & Infrastructure — Docker, CUDA, GitHub Actions, CI/CD
- 2 publications (co-author) + 2 in progress
- 86 citations in genomics and computational biology
Languages: Python, SQL, R, C++, JavaScript
Frameworks & Tools: PyTorch, TensorFlow, LangChain, LangGraph, Nextflow, FastAPI, Django
Cloud & DevOps: AWS, Azure, Docker, GitHub Actions, CI/CD
Databases: MySQL, MongoDB, PostgreSQL
- LinkedIn: linkedin.com/in/manish-kumar-0160837
- GitHub: github.com/man4ish
- Email: mandecent.gupta@gmail.com
💡 “Building AI-driven scientific systems that understand data — not just process it.”


