This project aims to analyze transportation efficiency in New York City using data from the Citibike bike-sharing system. It provides Java-based code to process Citibike data and SQL queries to perform analysis, along with scripts to clean, download, and merge datasets.
.
├── .gitignore
├── LICENSE
├── README.md
├── citibike
│ ├── CitibikeCounter.java
│ ├── CitibikeDriver.java
│ ├── CitibikeMapper.java
│ ├── CitibikeQueries.sql
│ ├── CitibikeUtils.java
│ └── Makefile
├── notes
│ ├── citibike.md
│ └── citibike_analysis.md
└── scripts
└── citibike
├── clean_datadir.sh
├── dataexploration.ipynb
├── download.py
└── merge_datasets.ipynb
CitibikeCounter.java: Java code for Citibike counting operations.CitibikeDriver.java: Java driver code to execute Citibike MapReduce code.CitibikeMapper.java: Java code for Citibike MapReduce operations.CitibikeQueries.sql: Trino queries to perform analysis on the processed Citibike data.CitibikeUtils.java: Utility class for Citibike-related operations.Makefile: Makefile for building and running the Java project.
citibike.md: Notes about the Citibike dataset and its attributes. Used during preprocessing.citibike_analysis.md: Notes on the analysis of the Citibike data.
clean_datadir.sh: Shell script to clean the data directory.dataexploration.ipynb: Jupyter Notebook for exploratory data analysis.download.py: Python script to download the Citibike datasets.merge_datasets.ipynb: Jupyter Notebook to merge multiple Citibike datasets.
- Run the
scripts/citibike/download.pyscript to download the Citibike datasets. - Use the
scripts/citibike/merge_datasets.ipynbnotebook to merge downloaded datasets. - Run
makein thecitibikedirectory to build and execute the MapReduce project. - Analyze the Citibike data using
citibike/CitibikeQueries.sql.