DataLite is a fully functional, lightweight mini data warehouse built with Python, SQLite, Pandas, and Dash. It demonstrates a complete ETL (Extract, Transform, Load) pipeline with an interactive dashboard for analytics. Designed to run efficiently on a standard laptop (Core i3, 4GB RAM), it serves as a foundation for modern data-driven decision-making in businesses and personal projects.
The project provides practical exposure to the skills required for modern data engineering, analytics, and business intelligence workflows, and is future-ready for integration with cloud-based warehouses and AI-driven analytics.
DataLite/
│
├── data/
│ └── sales.csv # Sample sales data
├── db/
│ └── warehouse.db # SQLite database (auto-created)
├── etl/
│ ├── extract.py # Extracts data from CSV
│ ├── transform.py # Cleans and transforms data
│ └── load.py # Loads data into SQLite
├── dashboards/
│ └── app.py # Interactive Plotly Dash dashboard
├── utils/
│ └── scheduler.py # Automates ETL on a schedule
├── run_etl.py # Executes the full ETL pipeline
├── requirements.txt # Project dependencies
└── README.md # Project documentation
git clone <your-repo-url>
cd DataLitepython -m venv venv
.�env\Scripts\activate # Windows
source venv/bin/activate # Linux/Macpip install -r requirements.txtDependencies include: pandas, numpy, dash, plotly.
python run_etl.py- Reads
data/sales.csv - Cleans and transforms the data
- Loads it into
db/warehouse.db - Output:
ETL completed successfully!
python dashboards/app.py-
Open in browser:
http://127.0.0.1:8050/ -
Features:
- Product selection dropdown
- Date range filter
- Sales bar chart with quantity totals
python utils/scheduler.py- Automatically runs ETL on a configurable interval (default: every 60 seconds)
- Data-Driven Decisions: Enables small businesses and analysts to understand sales trends and optimize inventory, marketing, and operations.
- Skill Development: Provides hands-on experience with Python-based ETL, SQL, Pandas, and interactive dashboards—skills that remain in demand through 2030.
- Scalable Foundation: Can evolve from SQLite to cloud warehouses like Snowflake or BigQuery, preparing you for enterprise-level analytics.
- Automation & AI Ready: The scheduler and ETL framework can integrate with machine learning models and AI-driven reporting, supporting predictive analytics and automated insights.
- Lightweight & Accessible: Runs locally on standard hardware while teaching modern data engineering and visualization practices.
- Future Expansion: Add more data sources, dashboards, and analytics modules to simulate real-world business intelligence projects and pipelines.
- Keep
venv/in.gitignorefor GitHub deployment. - Always
cdinto the project folder containingrun_etl.pybefore running scripts. - Update
data/sales.csvfor new ETL runs; dashboard reflects updated data automatically. - Future enhancements: integrate PostgreSQL, Snowflake, or AI analytics tools, add multiple dashboards, and expand ETL sources.