Skip to content
View sherozshaikh's full-sized avatar

Block or report sherozshaikh

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
sherozshaikh/README.md

Hey there, I'm Sheroz πŸ‘‹

Machine Learning Engineer & Data Scientist
Building production ML systems β€” from LLM-powered automation to healthcare AI

LinkedIn Email GitHub


πŸš€ About Me

  • πŸŽ“ M.S. Data Science @ Worcester Polytechnic Institute (WPI) | GPA: 3.9/4.0
  • πŸ† Best Data Science Project award winner (1st place out of 20+ teams) β€” healthcare ML project
  • πŸ₯ 5+ years building production ML systems across healthcare, fintech, and IoT
  • πŸ€– Passionate about LLMs, semantic search, and ML pipeline automation
  • πŸ“¦ Open-source contributor β€” published 4 Python packages on PyPI
  • πŸ“ Boston, MA

πŸ—οΈ What I've Built

  • LLM-Powered Ticket Routing β€” Claude API-based system automating 40% of classification workflows, saving ~$700/month in operational costs
  • ICD-10 Medical Coding System β€” Production LLM serving 10+ enterprise healthcare clients, processing 100K+ monthly requests
  • Semantic Search Platform β€” Vector embeddings over 940K healthcare documents, delivering ~$80K/month in operational savings
  • ML Document Classifier β€” Production classifier automating 80% of daily document triage (900+ docs) with 99%+ uptime
  • Time-Series Forecasting β€” PyTorch pipeline predicting equipment failures 30 days in advance
  • LoRA Fine-Tuning Pipeline β€” End-to-end text classification with parameter-efficient fine-tuning and reproducible benchmarking

🧰 Tech Stack

AI & ML Frameworks

PyTorch Scikit--learn HuggingFace LangChain XGBoost

LLMs & Vector Search

Claude OpenAI FAISS Pinecone Chroma

Data Engineering & ETL

PySpark Airflow Polars SQL

Production & MLOps

FastAPI Docker AWS MLflow GitHub Actions Prometheus

Languages

Python SQL Linux


πŸ“ˆ Highlights

  • πŸ₯ Deployed production LLM for ICD-10 medical coding serving 10+ enterprise healthcare clients
  • πŸ” Built semantic search over 940K documents, saving ~$80K/month in operational costs
  • ⚑ Automated 80% of daily document triage with ML classifier (900+ docs/day)
  • πŸ“Š Optimized PySpark ETL for 15M+ Medicare records β€” 75% fewer data scans, 58% faster queries
  • πŸ“¦ Published 4 open-source Python packages on PyPI for ML pipeline tooling
  • πŸ† 1st place β€” WPI Best Data Science Project (Winter 2024)

GitHub Streak

Profile Views


πŸ’¬ Let's connect β€” always happy to chat about ML engineering, LLMs, healthcare AI, or open-source!

Popular repositories Loading

  1. yolo_streamlit_pipeline yolo_streamlit_pipeline Public

    YOLO Model Training and Inference Pipeline with Streamlit

    Jupyter Notebook 2 1

  2. text_to_vector_embedding_pipeline text_to_vector_embedding_pipeline Public

    Text Embedding and Model Fetching Toolkit

    Jupyter Notebook 1

  3. json2ubl json2ubl Public

    Production-grade JSON to UBL 2.1 XML converter with schema-driven mapping

    Python 1

  4. CS_534_Artificial_Intelligence CS_534_Artificial_Intelligence Public

    WPI CS 534 Artificial Intelligence

    Python

  5. CS_547_Information_Retrieval CS_547_Information_Retrieval Public

    WPI CS 547 Information Retrieval

    Jupyter Notebook

  6. WPI_CourseWork WPI_CourseWork Public

    WPI CourseWork Assignments

    Jupyter Notebook