Skip to content

Cleaning the MLB 2018 regular season raw Statcast data in Jupyter and then importing the data into a SQL DataBase. MySQL was used to formulate all batter statistics to create a new CSV with all Statcast data added. More Tableau visualizations will be updated soon. 700,000+ entries reduced to a clean, accurate, sortable stat line.

Notifications You must be signed in to change notification settings

mcgeec91/MLB_Statcast

Repository files navigation

MLB_Statcast

WEBSITE: https://mcgeec91.github.io/MLB_Statcast/website/index.html

HOW TO RUN

  • Download repository to local drive.
  • Run the fixing statcast data file first.
  • Run part_2 to create database. Password needs to be added. Also, PATH might need to be changed for MySQL database location. Code works for MySQL--root, localhost:3306, personal password.
  • Run all the lines from the SQL script located in sql_db_script.txt that is in this repository.
  • Export 2018_MLB_batters table from MySQL into a csv and place inside respository. My data is in repository and is named "2018_mlb_batters.csv".

Does pitch velocity or ball spin rate from pitcher effect a batter's exit velocity?

Barrels Visualization

(*NOTE) bat_id_clean.csv is the clean version of bat_id.csv that is created after running fixing_all_pitches.ipynb. I used excel to clean the data. List of prblems include: 1 ID was missing. Names like J.D. Martinez showed up as firstname = J., last_name = D.. Also a few players had 3 names which will be a problem when using Fangraphs data later on in analysis.

About

Cleaning the MLB 2018 regular season raw Statcast data in Jupyter and then importing the data into a SQL DataBase. MySQL was used to formulate all batter statistics to create a new CSV with all Statcast data added. More Tableau visualizations will be updated soon. 700,000+ entries reduced to a clean, accurate, sortable stat line.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published