Skip to content

Latest commit

 

History

History

README.md

Example Directory

This directory contains output visualizations generated by the visualize.py script as part of the data pipeline for the student depression dataset analysis. These images demonstrate the results of the pipeline’s visualization step, showcasing insights derived from the processed data in Snowflake.

Contents

  • depression_rate.png

    • Description: A bar plot showing the depression rate by gender, derived from the GOLD_STUDENT_INSIGHTS table.
    • Purpose: Highlights differences in depression rates between male and female students. Depression Rate
  • cgpa_pressure.png

    • Description: A scatter plot illustrating the relationship between CGPA and academic pressure, colored by gender, from the SILVER_STUDENT_DATA table.
    • Purpose: Visualizes how academic pressure correlates with academic performance across genders. CGPA Pressure
  • age_distribution.png

    • Description: A histogram with a KDE showing the age distribution of students, sourced from the SILVER_STUDENT_DATA table.
    • Purpose: Provides an overview of the age demographics of the student population. Age Distribution
  • README.md (this file)

    • Description: Documentation of the contents and pipeline run examples.

Pipeline Run Examples

Below are examples of a working pipeline run, with screenshots demonstrating functionality.

1. Successful Ingestion (ingest.py)

  • Description: Loads raw CSV data into BRONZE_STUDENT_DATA and tracks lineage.
  • Screenshot: Ingestion Logs

2. Data Processing (process.py)

  • Description: Transforms bronze data into silver and gold layers.
  • Screenshot: Processing Logs

3. Visualization Output (visualize.py)

  • Description: Generates the visualizations above.
  • Screenshot: Visualization Logs

4. Model Training (model.py)

  • Description: Trains a depression prediction model, saving it to model/depression_model.joblib.
  • Screenshot: Model Training Logs

How to Generate These Outputs

  1. Run python code/ingest.py
  2. Run python code/process.py
  3. Run python code/visualize.py
  4. Run python code/model.py
  5. Check this directory for the images and model/ for the trained model.

Notes

  • Screenshots are placeholders. Replace with actual terminal outputs or file explorer views after running the pipeline (e.g., logs from pipeline.log or folder contents).
  • See the root README.md for project details and code/README.md for script instructions.