Skip to content
View Samarth1337's full-sized avatar

Block or report Samarth1337

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Samarth1337/README.md

Hello! I'm Samarth Saxena 👋

Data Scientist | AI Engineer | Machine Learning Engineer | Full-Stack Builder | Software Engineer |

Portfolio LinkedIn


About Me

I don’t just train models; I build the infrastructure, the pipelines, and the user-facing products that make them useful. I operate with a "ship-it" attitude, bridging the gap between a model in a Jupyter notebook and one thriving in the chaos of production.

  • Education: Master of Science in Applied Data Science from USC (GPA: 3.9/4.0) and B.Tech in Computer Science from Manipal University Jaipur (CGPA: 9.17/10.0).
  • Currently: Data Scientist at Cloud9 Esports, where I act as a team-of-one and develop AI solutions for the company, driving revenue, accelarating growth and creating impact.
  • Philosophy: Complexity is often just a lack of initiative. I pride myself on being the person who unblocks the team by building the solutions from scratch.
  • Current Focus: Build, build and build some more.

Annual Activity

Snake animation

Tech Stack

Programming & Core Languages

Python SQL C++ JavaScript TypeScript Java Scala Bash Tailwind

AI, LLMs & Machine Learning

Gemini Claude Llama2 LangChain RAG Fine-Tuning HuggingFace Reinforcement Learning Computer Vision BERT XGBoost n8n

Full-Stack, SWE & DevOps

Flask SQLAlchemy REST API AsyncIO Docker GitHub Actions Git Unit Testing

Cloud, Data & Infrastructure

GCP AWS Azure Airflow Spark PostgreSQL MongoDB DBeaver Linux

Libraries & Specialized Tools

Pandas NumPy PyTorch TensorFlow Keras FFmpeg Postman Jupyter ElevenLabs


Featured Projects

Award: Best Project & Best Data Science Developer (CKIDS DataFest ‘23 USC) Automated the outlining of safety trait violations in Nuclear Plant reports using a GuidedLDA model that learns from domain-specific "seed words".

A TA bot designed to replace a course assistant, capable of processing uploaded PDFs and generating conversational responses using FAISS, Llama 2, and HuggingFace embeddings.

Solved for Concept Drift and Catastrophic Forgetting in a 1M+ interaction dataset. Outperformed LSTM models by 7% using sliding windows and replay buffers.

Implemented Locality Sensitive Hashing (LSH) and collaborative filtering on the Yelp dataset to achieve a high prediction accuracy (RMSE: 0.978).

Built a layer-by-layer neural network using only NumPy, implementing backpropagation and optimization without high-level ML libraries.


Let's Connect


Pinned Loading

  1. Bradley-Fayyad-Reina-BFR_Clustering_Algorithm_Python Bradley-Fayyad-Reina-BFR_Clustering_Algorithm_Python Public

    In this project, I have implemented the Bradley-Fayyad-Reina (BFR) algorithm for clustering synthetic datasets. The goal of this assignment is to gain familiarity with the clustering process, diffe…

    Python

  2. Community_Detection_using_Girvan-Newman_in_Spark Community_Detection_using_Girvan-Newman_in_Spark Public

    In this project, I explored community detection in social networks using the Yelp dataset. Leveraging Spark GraphFrames and implementing the Girvan-Newman algorithm, I aimed to detect communities o…

    Python

  3. Emulate_HDFS_MapReduce Emulate_HDFS_MapReduce Public

    This project is a Python-based implementation of the Hadoop MapReduce software framework for processing large datasets using a distributed and parallel approach.

    Python

  4. Real-time-Stock-Price-Analysis-and-Algorithmic-Trading Real-time-Stock-Price-Analysis-and-Algorithmic-Trading Public

    This project is a sophisticated system designed for real-time stock price analysis and algorithmic trading. It comprises two main components: data collection, storage, and preprocessing (Part 1), a…

    Python

  5. Web-Scraping-Analysis-Clustering-on-Real-time-data Web-Scraping-Analysis-Clustering-on-Real-time-data Public

    This project aims to perform data analysis and clustering on the Reddit Tech forum. It involves web scraping, data preprocessing, topic selection, clustering algorithms, and real-time data processi…

    Python

  6. Yelp_Data_Analysis_using_Spark Yelp_Data_Analysis_using_Spark Public

    In this project, I ran a comprehensive exploration of Yelp's vast dataset to uncover valuable insights into user recommendations and business dynamics. Leveraging Spark, a powerful distributed comp…

    Python