Skip to content
View skerk001's full-sized avatar
😃
😃

Block or report skerk001

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
skerk001/README.md

Samir Kerkar - Data Scientist | Machine Learning | Causal Inference

Typing SVG

M.S. Data Science @ UC San Diego (incoming) · B.S. Mathematics, UC Irvine
📧 Samir2000VIP@gmail.com · 💼 LinkedIn


4+ years of healthcare data science — causal inference, predictive modeling, and NLP across 60,000+ patients and 20+ facilities. Heart failure outcomes manuscript under peer review (n=3,024, 11 clinics). COPD cost-effectiveness analysis showing $83.50 PMPM reduction (p=0.0027).


🔬 Featured Projects

GenomicsGPT — ML + LLM pipeline for clinical variant interpretation. XGBoost/LightGBM ensemble on 1.69M ClinVar variants (AUC 0.985, leakage-corrected) with SHAP explainability and Llama 3 / Claude report generation.

ClinicalRAG — RAG system for clinical QA over 220 discharge summaries with hallucination guardrails and citation tracking. 97.6% condition recall, 95.2% abstention accuracy.

CausalCare — Causal inference on ICU beta-blocker treatment effects using propensity matching, IPW, doubly robust estimation, Double ML, and Causal Forest via EconML/DoWhy.

Diabetic Retinopathy Classification — CNN-based 5-class severity grading from retinal fundus images (F1 = 0.94) with GradCAM interpretability. Paper

Also: REIGN NBA Analytics (cross-era player impact models, 29,969 player-seasons) · Gene Expression Cancer Prediction (AML vs. ALL classification, F1 = 0.95)


🛠️ Tech Stack

ML/AI Python · scikit-learn · XGBoost · LightGBM · TensorFlow/Keras · SHAP · EconML/DoWhy · R
LLM/NLP Llama 3 · Claude API · LangChain · ChromaDB · RAG pipelines
Engineering React · TypeScript · Flask · FastAPI · PostgreSQL · SQL · Git
Domain EHR/clinical data · genomics · causal inference · healthcare analytics

In my free time — chess (2500+ rated), basketball, piano, and gaming.

Pinned Loading

  1. diabetic-retinopathy-classification diabetic-retinopathy-classification Public

    CNN-based 5-class diabetic retinopathy severity classification from retinal fundus images (F1 = 0.94)

  2. gene-cancer-prediction gene-cancer-prediction Public

    ML classification of AML vs. ALL leukemia subtypes from gene expression data (F1 = 0.95)

    Jupyter Notebook

  3. clinical-rag clinical-rag Public

    RAG system for clinical question answering over 220 discharge summaries with hallucination guardrails, citation tracking, and chunking strategy evaluation (97.6% condition recall)

    Python

  4. genomicsgpt genomicsgpt Public

    ML + LLM pipeline for genetic variant pathogenicity prediction (AUC 0.9949, 1.69M ClinVar variants) with SHAP explainability and clinical report generation via Llama 3 / Claude

    Jupyter Notebook

  5. CausalCare CausalCare Public

    Causal inference analysis of ICU beta-blocker treatment effects using propensity matching, IPW, doubly robust estimation, Double ML, and Causal Forest on eICU data

    Python

  6. reign-web reign-web Public

    NBA player impact analytics across 80 years. Era-specific composite models, playoff opponent adjustments, and interactive visualizations for 3,484 players (1946–2025).

    JavaScript