Tech stack: Python (pandas, sqlite3), SQL (SQLite), Jupyter Notebook
This project demonstrates end-to-end data preprocessing and cleaning for NYC restaurant inspection and ratings data.
Cleaned tables are stored in a SQLite database (restaurant_data.db) and analytical SQL queries are run to answer practical questions.
This README intentionally does not include figures; instead, long SQL outputs are exported to CSV files for inspection.
- Retrieve raw inspection and ratings data (
pd.read_html()and API). - Clean and standardize both tables (dates, types, names, price levels, ZIP codes, dtypes).
- Save cleaned DataFrames into a SQLite database.
- Run analytical SQL queries (average scores by cuisine, price-level impact, pass rates by inspection type).
- Export long SQL results to CSV files for sharing.
import os, pandas as pd, sqlite3, requests
from dotenv import load_dotenv
url = "https://dlai-lc-dag.s3.us-east-2.amazonaws.com/restaurant_inspections.html"
tables = pd.read_html(url)
inspections = tables[0]
load_dotenv()
API_KEY = os.getenv("RESTAURANTS_API_KEY")
params = {"api_key": API_KEY, "limit": 1000, "offset": 0}
(loop with offset=i*1000 to collect ~35k rows)
- Convert date strings to
datetime - Normalize business names (uppercase, strip special chars)
- Map
price_levelstrings ($$$$$→5 …- No ratings yet→0) - Extract ZIP codes, filter to NYC (10001–11697)
- Drop duplicates and invalid rows
conn = sqlite3.connect("restaurant_data.db")
inspections_clean.to_sql("inspections", conn, if_exists="replace", index=False)
ratings_clean.to_sql("ratings", conn, if_exists="replace", index=False)
conn.close()
Average score by cuisine
SELECT CUISINE_DESCRIPTION, COUNT(*) AS total_inspections,
AVG(SCORE) AS avg_score
FROM inspections
GROUP BY CUISINE_DESCRIPTION
ORDER BY avg_score;
Price level impact
SELECT DISTINCT inspections.DBA, inspections.SCORE, inspections.INSPECTION_DATE,
ratings.price_level, ratings.rating
FROM inspections
JOIN ratings
ON inspections.DBA = ratings.NAME
AND inspections.ZIPCODE = ratings.zip_code
WHERE ratings.price_level IS NOT NULL
ORDER BY inspections.SCORE DESC;
Pass rate by inspection type
SELECT INSPECTION_TYPE,
COUNT(*) AS total_inspections,
AVG(CASE WHEN SCORE <= 13 THEN 1 ELSE 0 END) * 100 AS pass_rate,
AVG(SCORE) AS avg_score
FROM inspections
WHERE SCORE IS NOT NULL
GROUP BY INSPECTION_TYPE
ORDER BY pass_rate DESC;
conn.close()
- Cuisine types with average inspection scores
- Joined inspections + ratings with
SCORE,price_level,rating - Pass-rate table grouped by
INSPECTION_TYPE(pass defined asSCORE <= 13)