M.S. Data Science @ UC San Diego (incoming) · B.S. Mathematics, UC Irvine
📧 Samir2000VIP@gmail.com · 💼 LinkedIn
4+ years of healthcare data science — causal inference, predictive modeling, and NLP across 60,000+ patients and 20+ facilities. Heart failure outcomes manuscript under peer review (n=3,024, 11 clinics). COPD cost-effectiveness analysis showing $83.50 PMPM reduction (p=0.0027).
GenomicsGPT — ML + LLM pipeline for clinical variant interpretation. XGBoost/LightGBM ensemble on 1.69M ClinVar variants (AUC 0.985, leakage-corrected) with SHAP explainability and Llama 3 / Claude report generation.
ClinicalRAG — RAG system for clinical QA over 220 discharge summaries with hallucination guardrails and citation tracking. 97.6% condition recall, 95.2% abstention accuracy.
CausalCare — Causal inference on ICU beta-blocker treatment effects using propensity matching, IPW, doubly robust estimation, Double ML, and Causal Forest via EconML/DoWhy.
Diabetic Retinopathy Classification — CNN-based 5-class severity grading from retinal fundus images (F1 = 0.94) with GradCAM interpretability. Paper
Also: REIGN NBA Analytics (cross-era player impact models, 29,969 player-seasons) · Gene Expression Cancer Prediction (AML vs. ALL classification, F1 = 0.95)
| ML/AI | Python · scikit-learn · XGBoost · LightGBM · TensorFlow/Keras · SHAP · EconML/DoWhy · R |
| LLM/NLP | Llama 3 · Claude API · LangChain · ChromaDB · RAG pipelines |
| Engineering | React · TypeScript · Flask · FastAPI · PostgreSQL · SQL · Git |
| Domain | EHR/clinical data · genomics · causal inference · healthcare analytics |
In my free time — chess (2500+ rated), basketball, piano, and gaming.