ETL Pipeline for scraping, transforming, and loading fashion product data from Fashion Studio website.
- ✅ Web scraping 1000+ products from 50 pages
- ✅ Data cleaning and transformation
- ✅ Currency conversion (USD → IDR)
- ✅ Export to multiple repositories (CSV, Google Sheets, PostgreSQL)
- ✅ Unit tests with coverage ≥85%
- ✅ Error handling and logging
- Python 3.9+
- BeautifulSoup4 - Web scraping
- Pandas - Data manipulation
- SQLAlchemy - Database connection
- Google Sheets API - Cloud storage
- Pytest - Unit testing
# Clone repository
git clone https://github.com/YOUR_USERNAME/fashion-studio-etl-pipeline.git
cd fashion-studio-etl-pipeline
# Create virtual environment
python -m venv venv
source venv/bin/activate # Mac/Linux
# or
venv\Scripts\activate # Windows
# Install dependencies
pip install -r requirements.txt# Run full ETL pipeline
python main.py
# Run unit tests
pytest tests/ -v
# Check test coverage
coverage run -m pytest tests/
coverage report- CSV:
products.csv(867 valid products) - Google Sheets: [Link to your sheet]
- PostgreSQL: Database
fashion_products, tableproducts
Target: ≥80% Current: 85%+
MIT License - see LICENSE file for details
Fauzan Suryahadi - Dicoding Submission