Name	Name	Last commit message	Last commit date
parent directory ..
README.md	README.md
__init__.py	__init__.py
config.ini	config.ini
ingest.py	ingest.py
model.py	model.py
process.py	process.py
visualize.py	visualize.py

Name

Last commit message

Last commit date

__init__.py

Code Directory

This directory contains the core Python scripts that implement the data pipeline and machine learning model for the student depression dataset analysis project. These scripts handle data ingestion, processing, visualization, and predictive modeling, interacting with Snowflake for data storage and retrieval.

ingest.py
- Description: Ingests raw data from data/student_depression_dataset.csv into the Snowflake BRONZE_STUDENT_DATA table and tracks lineage metadata in DATA_LINEAGE.
- Usage: python code/ingest.py
process.py
- Description: Transforms bronze data into silver (SILVER_STUDENT_DATA) and gold (GOLD_STUDENT_INSIGHTS) layers in Snowflake with cleaning and aggregation steps.
- Usage: python code/process.py
visualize.py
- Description: Generates visualizations (bar plots, scatter plots, histograms) from the gold and silver layers, saving them to example/.
- Usage: python code/visualize.py
model.py
- Description: Trains a Random Forest Classifier to predict depression using features from the SILVER_STUDENT_DATA table (e.g., age, academic pressure, gender). Saves the trained model to model/depression_model.joblib.
- Usage: python code/model.py
- Output: A trained model file (model/depression_model.joblib) and performance metrics logged to pipeline.log.
config.ini
- Description: Configuration file with Snowflake credentials and settings (e.g., user, password, account, database).
- Note: Ensure this file is populated with valid credentials before running the pipeline.
__init__.py
- Description: Empty file to make code/ a Python package, enabling modular imports if needed.

Project Overview

The scripts in this directory form a comprehensive data pipeline and analysis system for student depression data:

Ingestion: Loads raw CSV data into Snowflake’s bronze layer (ingest.py).
Processing: Cleans and aggregates data into silver and gold layers (process.py).
Visualization: Produces visual insights saved in example/ (visualize.py).
Modeling: Trains a machine learning model to predict depression based on cleaned data (model.py).

Instructions

Setup:

Install dependencies:

pip install snowflake-connector-python pandas sqlalchemy snowflake-sqlalchemy matplotlib seaborn scikit-learn joblib

Configure config.ini with your Snowflake credentials.

Run the Pipeline:

python code/ingest.py    # Ingest raw data
python code/process.py   # Process data into silver and gold layers
python code/visualize.py # Generate visualizations
python code/model.py     # Train and save the ML model

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Code Directory

Contents

Project Overview

Instructions

FilesExpand file tree

code

Directory actions

More options

Directory actions

More options

Latest commit

History

code

Folders and files

parent directory

README.md

Code Directory

Contents

Project Overview

Instructions