A real-time, multi user music recommendation system designed for social settings. The system aggregates playlist data from every single user, clusters the songs by genres and adapts playback using live user feedback.
- Multi User party rooms
- Genre based clustering
- Real time like and dislike based feedback loop
- Adaptive group recommendations
- Users join a common room and authenticate via a music provider
- Playlist and track metadata is fetched
- Tracks are clustered based on genre
- Songs from dominant clusters are prioritized
- Feedback influences subsequent recommendations
- Python 3.11
- Flask (Cloud Run REST API)
- Apache Beam (Dataflow Runner)
- Google Cloud Dataflow
- Google Cloud Storage
- Google Cloud Firestore
- Google Cloud Pub/Sub
- Google Cloud Run
- Google Cloud Functions
- Pandas
- NumPy
- SciPy
- Apache Beam transforms
- Scikit-Learn
- PyTorch
- TensorFlow Data Validation (TFDV)
- YouTube Data API v3
- iTunes Search API
- OAuth 2.0 (Google)
- React
- Firebase (Firestore real-time listeners)
- OAuth 2.0
- pytest
- pytest-cov
- unittest.mock
- GitHub Actions
- Google Cloud Build
- Docker
- Artifact Registry
- Cloud Logging
- Slack Webhooks
- GCS path-based versioning by session_id
The data pipeline is built on GCP and handles:
- Playlist ingestion via YouTube OAuth + iTunes metadata enrichment
- Distributed processing using Apache Beam on Dataflow
- Bias detection and mitigation across genre and country slices
- Real-time feedback processing via Pub/Sub streaming pipeline
- Schema validation, anomaly detection, and Slack alerting
See backend/ for full pipeline documentation.
- Group based Genre clustering
- Feedback weighted ranking
- Online adaptation during sessions
- Python 3.11+
- GCP account with Cloud Run, Dataflow, Firestore, Pub/Sub, GCS enabled
- YouTube OAuth credentials
git clone <repo_url>
cd <repo_name>
pip install -r requirements.txtpytest tests/api_tests/ -v --cov=backend- User consented data only
- Session based processing
- No long term storage of playlist
In Development
