Iβm a Machine Learning Engineer + Researcher currently pursuing my M.S. in Computer Science at NYU Courant (GPA: 4.0).
My work sits at the intersection of:
- LLM reasoning, retrieval & attention mechanisms
- Healthcare ML & safety for deployed clinical AI
- Document intelligence, VLMs, synthetic data generation
- Production ML systems & model monitoring
Working with Prof. Eunsol Choi on multilingual LLM retrieval:
- Improving in-context fact retrieval across 5 languages
- Modifying attention mechanisms in LLaMa-3.2-8B, Qwen-2.5-7B, Phi-3.5
- Achieved 15% retrieval gains with 30% lower KV-cache
Building production-grade safety systems for ML models powering clinical workflows across 23 hospitals.
- Designed drift detection pipelines (KβS, PSI, DeLong)
- Real-time monitoring with Prometheus + Grafana
- Extensive work with HIPAA-compliant datasets (EPIC COSMOS, OMOP CDM, Caboodle, Clarity)
- Co-authored NIH & PCORi grant proposals
Published at ICDAR 2025 (Oral, Top 2%).
- Generated 18k synthetic slides using novel LLM pipeline
- Boosted VLM performance by 13% mAP and 10% Recall@K
- HuggingFace model reached 500+ downloads
AI-Generated Lecture Slides for Improving Slide Element Detection and Retrieval
Maniyar, Trivedi et al.
π Project: https://synslidegen.github.io
π DOI: https://doi.org/10.1007/978-3-032-04614-7_11
π Code: https://github.com/NerdyVisky/adaptive-gpu-hashtable
- Built a high-performance adaptive GPU hash table in C++/CUDA using cooperative groups and elected-lane atomics
- Achieved 21Γ faster inserts and 20Γ faster lookups, outperforming naΓ―ve GPU hashing at scale
- Implemented epoch-based dynamic resizing + compaction for non-blocking concurrency
- Sustains stable throughput on 100M+ operations, even at 0.99 load factor
- Designed Attention-Aware DPO improving multi-image VQA accuracy by 8.5%
- Applied AdaptVis for inference boosts β 10% over base model
- Built LLM-as-a-judge with Gemini-2.5-Pro
π Code: https://github.com/harsh-sutariya/AA-DPO
π Website: https://nerdyvisky.github.io/projects/AttnDPO/
- Refactored full codebase for faster, leaner execution
- Built vectorized dataloaders, added flash-attention, integrated vLLM
- Reduced inference time 2 hrs β 30 mins (4Γ faster)
π Code: https://github.com/NerdyVisky/multilingual-retrieval-translation-heads
Python Β· C/C++ Β· R Β· SQL (Postgres, MySQL) Β· JavaScript Β· TypeScript Β· Bash/Zsh
PyTorch Β· TensorFlow Β· HuggingFace Β· LangChain
NumPy Β· Pandas Β· scikit-learn Β· Matplotlib
AWS Β· GCP Β· Azure Β· Databricks
Docker Β· Git Β· Redis Β· MongoDB
Prometheus Β· Grafana
Thanks for stopping by! Feel free to reach out if you're working on LLMs, retrieval, ML safety, or healthcare AI. π
[Nov 2025] - I am looking for fulltime roles related to SDE/MLE and Applied Science based in the US starting Summer 2026. I am a US Permanent Resident (Green Card), and hence require no visa sponsorship. If you're hiring and like my work, feel free to connect on my email : vishvesh106@gmail.com


