Comprehensive analysis of NYC 311 service request data to identify complaint patterns, response times, and geographic distributions across New York City boroughs.
# 1. Install dependencies
pip install -r requirements.txt
# 2. Run the full analysis pipeline
python main.pyThis generates:
- nyc311_profile.html - Interactive HTML profiling report with all charts embedded
- Report.pdf - Executive summary
GET_305_Data_Analysis/
├── main.py # ⭐ MAIN ENTRY POINT
├── requirements.txt # Python dependencies
├── README.md # This file
├── NYC311_analysis.ipynb # Detailed Jupyter notebook with statistics
├── nyc311_sql_tasks.sql # SQL cleaning queries
├── setup_database.py # Database setup module
├── generate_dashboard.py # HTML profiling report generator
├── generate_report.py # PDF report module
├── nyc311_profile.html # Generated profiling report
├── Report.pdf # Generated PDF report
└── .gitignore # Git ignore rules
python main.pypython main.py --setup # Setup database only
python main.py --dashboard # Generate HTML profiling report only
python main.py --report # Generate PDF report only
python main.py --help # Show all optionsFor interactive analysis with statistics:
jupyter notebook NYC311_analysis.ipynb| File | Description |
|---|---|
nyc311.db |
SQLite database with raw and cleaned data |
nyc311_profile.html |
📊 HTML profiling report with embedded charts |
Report.pdf |
Executive summary PDF |
The nyc311_profile.html includes:
- 📈 Time series of complaint volume
- 📋 Top 10 complaint types
- 🗺️ Geographic distribution map
- ⏱️ Response time analysis
- 📊 Borough comparison
- 🕐 Hourly patterns
- 📉 Data quality statistics
All charts are embedded directly in the HTML - no separate image files!
Raw CSV → SQLite (raw_311) → SQL Cleaning → 311_cleaned → Profiling + Report
- Brooklyn has the highest complaint volume (~118,864 requests)
- HEAT/HOT WATER is the most common complaint type
- Significant differences in response times across boroughs (p < 0.05)
- Strong association between complaint types and boroughs (p < 0.001)
- Hypothesis Test 1: Two-sample t-test (Manhattan vs Brooklyn response times)
- Hypothesis Test 2: Chi-square test of independence (complaint type × borough)
- Correlation Analysis: Pearson and Spearman coefficients
- Regression: OLS model predicting response time
- Python 3.9+
- See
requirements.txtfor dependencies
Muhammad Muntazar Tasiu 20231725