The repository contains the followings:
- run.sh: A shell script to:
- Download the dataset from S3
- Install all Python requirements
- Train the data by calling train.py
- Launch model server by calling api.py & model_core.py
- API Calls methods:
- api_client.py: a python script to call the model server
- curl example
- Postman collection
- Notebook files
- model.ipynb: A notebook that contains process to find the best training approach and model evaluation
- data_analysis.ipynb: A notebook that contains EDA on raw data
- part_two.ipynb: A notebook that contains analysis related to the part two of the assessment
- Dockerfile: To build a linux image that runs the model (optional)
git clone parking_citations_analysis.git
In order to isolate your local python development environment for this project, you can take advantage of Python virtual environments virtual environments:
> pip install virtualenv
> cd parking_citations_analysis/
> virtualenv venv
> source venv/bin/activate
You can leave the virtual environment in your termial by:
> deactivate
The following shell script should download the required datasets (may take a few minutes), and train the data:
> ./run.sh
Important note: If you do not have the dataset already downloaded in both forms of CSV, and SQLite DB, you need to set the following configuration flags to True in the script:
#Should the dataset be downloaded?
download_data_enabled=1;
#Should the sqlite database be created?
create_sqlite_db_enabled=1;
If you want to re-train the model from the scratch, set the following configuration flat to True (Please note that trainign the model may take hours.) :
#Should the model be trained?
train_model_enabled=1;
If you get the following error when running the script:
./run.sh: Permission denied
Add read/write/execution permissions to your file to everyone:
chmod 777 run.sh
The script will launch a local Flask server for the model.
If script runs with no issues, Flask should be running:
INFO:werkzeug: * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
If you have Docker installed, build the docker image using the docker file:
> cd parking_citations_analysis/
> docker build . -t citations
This takes a few minutes as the docker build process also downloads the dataset from S3 and creates a SQLite database copy out of it as well. Then you should be able to run the built container
> docker run -it -p 5000:5000 citations
For the pretrained model we only need to pass 4 columns: "Color", "Body Style", "Fine amount", "Plate Expiry Date" You have three options to call the model server:
There is also a quick python script (api_client.py) that uses Python's http.client library to call the API. You can modify the payload in code and run the script to call the model server.
Send your sample data from a separate terminal using curl:
curl --header "Content-Type: application/json" \
--request POST \
--data '{"Color": "WH", "Body Style": "PA", "Fine amount": 50.0,"Plate Expiry Date": 200304.0}' \
http://localhost:5000/model
You should be able to see the response in form of a JSON object:
{"popular_make_probability":55.23}
If you have Postman installed, you can import the sample Postman collection in the repository where the API calls are pre-defined:
Parking-Citatations-Postman-Call.postman_collection.json