This project is under development right now.
This project, InsightFlow, focuses on developing an end-to-end machine learning pipeline for predicting house prices using property features such as size and location. The pipeline integrates MLOps principles and utilizes tools like ZenML and MLflow to ensure scalability, reproducibility, and robustness.
-
Data Ingestion:
- Abstracted data ingestion process using the Factory Design Pattern.
- CSV files are extracted from
.ziparchives and read into Pandas DataFrames.
-
Data Inspection:
- Implemented the Strategy Design Pattern to support multiple inspection strategies:
- Data Type Inspection: Displays data types and non-null counts.
- Summary Statistics Inspection: Provides statistical summaries for numerical and categorical columns.
- Implemented the Strategy Design Pattern to support multiple inspection strategies:
-
Missing Values Analysis:
- Utilized the Template Design Pattern for a structured approach to analyzing missing values.
- Two key tasks:
- Identify Missing Values: Counts missing values in each column.
- Visualize Missing Values: Displays a heatmap to visualize missing data.
└── dhananjay6561-InsightFlow/
├── README.md
├── requirements.txt
├── analysis/
│ ├── EDA.ipynb
│ └── analyze_src/
│ ├── basic_data_inspection.py
│ └── missing_values_analysis.py
├── data/
│ └── archive.zip
├── extracted_data/
│ └── AmesHousing.csv
└── src/
└── ingest_data.py
- Create a Python virtual environment:
python -m venv insightflow
- Activate the virtual environment:
- Windows:
.\insightflow\Scripts\activate
- Linux/Mac:
source insightflow/bin/activate
- Windows:
- Install required packages:
pip install -r requirements.txt
- Data Ingestion:
python src/ingest_data.py
- Data Inspection:
python analysis/analyze_src/basic_data_inspection.py
- Missing Values Analysis:
python analysis/analyze_src/missing_values_analysis.py
- Extracts CSV files from
.ziparchives. - Ensures only one
.csvfile exists in the archive.
- Strategy Design Pattern:
- DataTypeInspectionStrategy: Displays data types and non-null counts.
- SummaryStatisticsInspectionStrategy: Shows descriptive statistics for numerical and categorical columns.
- Template Design Pattern:
- Abstracted the process of identifying and visualizing missing values.
- Concrete implementation visualizes missing values using Seaborn's heatmap.
We welcome contributions to make InsightFlow even better! Here's how you can contribute:
-
Fork the Repository:
- Click the
Forkbutton on the top-right corner of this repository.
- Click the
-
Clone Your Fork:
git clone https://github.com/dhananjay6561/InsightFlow.git
-
Create a Feature Branch:
git checkout -b feature/your-feature-name
-
Make Changes and Commit:
- Make your desired changes.
- Commit your changes:
git commit -m "Add your commit message here"
-
Push Changes and Submit Pull Request:
git push origin feature/your-feature-name
- Go to the original repository and submit a Pull Request.
-
Wait for Review:
- Your changes will be reviewed, and feedback may be provided.
Thank you for checking out InsightFlow! We are committed to making machine learning workflows more efficient and reproducible. If you have any suggestions or questions, feel free to open an issue or reach out. Together, we can make this project even better!