I design and build end-to-end ML systems, NLP/RAG pipelines, and production-grade Python/MLOps tooling.
My engineering focus is on reproducibility, structured retrieval, robust data pipelines, and clean, testable ML code.
I specialize in:
- NLP / RAG systems (structured retrieval, SPARQL reasoning, knowledge-graphs)
- Model reproducibility (MLflow, DVC, deterministic pipelines)
- Python engineering (packaging, CI/CD, testing, modular design)
- High-performance data processing (parallel pipelines, QC systems)
I care about building ML systems that are reliable, interpretable, and easy for others to run and extend.
Parallel, chromosome-aware phasing tool with confidence scoring.
Packaged as a CLI with full tests, CI, and docs.
➡️ https://github.com/SFGLab/SvPhaser
A complete clinical retrieval assistant using DrugBank/FDA/PubChem RDF, multi-stage retrieval, and structured evidence reasoning.
➡️ https://github.com/PM-0125/INFERMed
Format-aware parsing, QC metrics, deterministic workflows, and reproducible CLI tooling (pytest/mypy/ruff + CI).
➡️ https://github.com/SFGLab/lophos
Languages: Python · C++ · SQL · SPARQL
ML/AI: PyTorch · TensorFlow · XGBoost · scikit-learn · NLP · RAG · Feature Engineering
MLOps: MLflow · DVC · Docker · Conda · GitHub Actions · CI/CD · pytest · mypy · ruff · black
Data/DB: Pandas · NumPy · PostgreSQL · MySQL · RDF Knowledge Graphs · Apache Jena · QLever
Dev/Platforms: Linux · Git/GitLab · VS Code · Google Colab · GCP (Cloud Run / Vertex AI basics)
Parallel SV phasing with confidence scoring; shipped as a reproducible CLI tool with tests/CI/docs.
➡️ https://github.com/SFGLab/SvPhaser
RAG-style structured retrieval + reasoning over biomedical knowledge graphs.
➡️ https://github.com/PM-0125/INFERMed
High-performance QC pipeline with reproducible environments and automation.
➡️ https://github.com/SFGLab/lophos
Algorithmic pipeline integrating read-depth and split-read signals.
➡️ https://github.com/PM-0125/Computational-Genomics/tree/main/Structural_Variant_Detection_Algorithm
Comparative ML modelling (XGBoost/PCA/IF) on the METABRIC dataset.
➡️ https://github.com/PM-0125/AI_ML_Projects/tree/main/Advanced%20Breast%20Cancer%20Analysis
- M.Sc., Computer Science & Information Systems (AI) — Warsaw University of Technology
- B.Tech., Computer Engineering (AI) — Marwadi University
I teach ML in a practical, builder-first way.
📧 Email: pranjulmishra228161@gmail.com
🔗 LinkedIn: https://www.linkedin.com/in/pranjul-mishra/
💻 GitHub: https://github.com/PM-0125