Skip to content
View huy-dataguy's full-sized avatar

Block or report huy-dataguy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
huy-dataguy/README.md

Hi there 👋, I'm Nguyễn Quốc Huy

huy-dataguy

🎓 Passionate Data Engineering Student

I'm a final-year Data Engineering student at HCMUTE, Vietnam, passionate about building scalable data pipelines, real-time streaming systems, and exploring modern Big Data technologies. I focus on mastering Big Data technologies, real-time streaming, data lakehouse and warehouse architectures, while also exploring full-stack development (MERN) to complement my data engineering skills.

🔭 I’m currently studying at HCMUTE (Ho Chi Minh City University of Technology and Education)

📫 How to reach me: quochuy.working@gmail.com

🚀 What I'm working on:

  • Designing and optimizing real-time data streaming systems data lakehouse and warehouse
  • Building efficient data pipelines with Apache Spark, Kafka, and Delta Lake
  • Exploring MLOps for deploying machine learning models
  • Experimenting with full-stack MERN applications

Connect with me

huyhocdata 100012067900880


Tech Stack

Apache Spark Apache Kafka Delta Lake MinIO Apache Airflow Apache Hadoop Trino Docker Apache Superset MERN Stack MongoDB Express.js React Node.js Python MySQL C#


Activity

huy-dataguy's GitHub stats

Pinned Loading

  1. Reddit-Streaming-Lakehouse Reddit-Streaming-Lakehouse Public

    Reddit data pipeline with Lakehouse architecture. Collects posts & comments, processes them (Bronze -> Silver -> Gold), performs sentiment/trend analysis, and visualizes with Superset. Built with …

    Jupyter Notebook

  2. NYC-Taxi-Lakehouse NYC-Taxi-Lakehouse Public

    Real-time Big Data Streaming simulating NYC taxi trip analytics using a modern Lakehouse architecture. Ingests high-volume Parquet data into Kafka, processes it with Spark Structured Streaming, sto…

    Python 1 3

  3. Binance-realtime-analytics Binance-realtime-analytics Public

    Forked from Wal-Liu/Binance-realtime-analytics

    Python

  4. Spark-on-YARN Spark-on-YARN Public

    This repository contains the configuration and scripts necessary to run Apache Spark on a Hadoop YARN cluster in client mode. The setup allows you to leverage the scalability of YARN for distribute…

    Dockerfile

  5. Salus-Assistant-Hackathon2025 Salus-Assistant-Hackathon2025 Public

    Full-stack AI-powered dietary assistant focusing on Vietnamese cuisine. Built in 24h at Hackathon HCMUTE 2025. Features smart meal suggestions, nutrition tracking, Gemini-powered chatbot, and meal …

    JavaScript

  6. HadoopSphere HadoopSphere Public

    Forked from DOCUTEE/HaMu

    Containerized Hadoop cluster with Spark, Hive, Pig, HBase, and Zookeeper for scalable Big Data processing using Docker.

    Shell 3