Manage AutoML workflows with AWS StepFunctions and AutoGluon on Amazon SageMaker

In this repository, we present a deployement-ready AWS stack which uses AWS Step Functions to orchestrate AutoML workflows using AutoGluon on Amazon SageMaker.

A complete description can be found in the corresponding blog post.

Main State Machine	Training State Machine	Deployment State Machine

Installation

Prerequisites

Node.js 16.13.1
Python 3.7.10

Step-by-step setup

Clone this repository to your cloud environment of choice (Cloud9, EC2 instance, local aws environemnt, ...)
Create IAM role needed to deploy the stack (skip to 3. if you already have a role with sufficient permissions and trust relationship).

Using AWS CLI
1. Configure AWS CLI profile that you would like to use, if not configured yet with aws configure and follow the instructions
2. Create a new IAM role which can be used by Cloud Formation with aws iam create-role --role-name {YOUR_ROLE_NAME} --assume-role-policy-document file://trust_policy.json
3. Attach permissions policy to the new role aws iam put-role-policy --role-name {YOUR_ROLE_NAME} --policy-name {YOUR_POLICY_NAME} --policy-document file://permissions_policy.json
Alternatevily, you can create the role using AWS IAM Management Console. Once created, make sure to update Trust Relationship with trust_policy.json and attach a customer Permissions Policy based on permissions_policy.json

Create a new python virtual environment python3 -m venv .venv
Activate the environment source .venv/bin/activate
Install AWS CDK npm install -g aws-cdk@2.8.0
Install requirements pip install -r requirements.txt
Bootstrap AWS CDK for your aws account cdk bootstrap aws://{AWS_ACCOUNT_ID}/{REGION}. If your account has been bootstrapped already with cdk@1.X, you may need to manually delete CDKToolkit stack from AWS CloudFormation console to avoid compatibility issues with cdk@2.X. Once de-bootstrapped, proceed by re-bootstrapping.
Deploy the stack with cdk deploy -r {NEW_ROLE_ARN}

Notebook Walkthrough (SUGGESTED)

Once the stack is deployed, you can familiarize with the resources using the tutorial notebooks/AutoML Walkthrough.ipynb.

State Machines Input Documentation

Action flows defined using AWS Step Functions are called State Machine. Each machine has parameters that can be defined at runtime (i.e. execution-specific) which are specified through an input json object. Some exemples of input parameters are presented in notebooks/input/. Despite being meant to be used during the notebook tutorial, you can also copy/paste them directly into the AWS Console.

Request Syntax

{
    "Parameters": {
      "Flow": {
        "Train": true|false,
        "Evaluate": true|false,
        "Deploy": true|false
      },
      "PretrainedModel":{
          "Name": "string"
      },
      "Train": {
        "TrainDataPath": "string",
        "TestDataPath": "string",
        "TrainingOutput": "string",
        "InstanceCount": int,
        "InstanceType": "string",
        "FitArgs": "string"",
        "InitArgs": "string"
      },
      "Evaluation": {
        "Threshold": flaot,
        "Metric": "string"
      },
      "Deploy": {
        "InstanceCount": int,
        "InstanceType": "string",
        "Mode": "endpoint"|"batch",
        "BatchInputDataPath": "string",
        "BatchOutputDataPath": "string"
      }
    }
}

Parameters

Flow
- Train (bool) - (REQUIRED) indicates if a new AutoGluon SageMaker Training Job is required. Set to false to deploy a pretrained model.
- Evaluation (bool) - set to true if evaluation is required. If selected, a AWS Lambda will retreive model performances on test set and evaluate them agains user-defined threshold. If model performances are not satisfactory, deployment is skipped.
- Deploy (bool) - (REQUIRED) indicates if model has to be deployed.
PretrainedModel
- Name (string) - indicates which pre-trained model to be used for deployment. Models are referenced through their SageMaker Model Name. If Flow.Train = true this field is ignored, otherwise it's required.
Train (REQUIRED if Flow.Train = true)
- TrainDataPath (string) - S3 URI where train csv is stored. Header and target variable are required. AutoGluon will perform holdout split for validation automatically.
- TestDataPath (string) - S3 URI where test csv is stored. Header and target variable are required. Dataset is used to evaluate model performances on samples not seen during training.
- TrainingOutput (string) - S3 URI where to store model artifacts at the end of training job.
- InstanceCount (int) - Number of instances to be used for training.
- InstanceType (string) - AWS instance type to be used for training (e.g. ml.m4.2xlarge). See full list here.
- FitArgs (string) - double JSON-encoded dictionary containing parameters to be used during model .fit(). List of available parameters here. Dictionary needs to be encoded twice because it will be decoded both by State Machine and SageMaker Training Job.
- InitArgs (string) - double JSON-encoded dictionary containing parameters to be used when model is initiated TabularPredictor(). List of available parameters here. Dictionary needs to be encoded twice because it will be decoded both by State Machine and SageMaker Training Job. Common parameters are label, problem_type and eval_metric.
Evaluation (REQUIRED if Flow.Evaluate = true)
- Threshold (float) - Metric threshold to consider model performance satisfactory. All metrics are maximized (e.g. losses are repesented as negative losses).
- Metric (string) - Metric name used for evaluation. Accepted metrics correspond to avaiable eval_metric from AutoGluon.
Deploy (REQUIRED if Flow.Deploy = true)
- InstanceCount (int) - Number of instances to be used for training.
- InstanceType (string) - AWS instance type to be used for training (e.g. ml.m4.2xlarge). See full list here.
- Mode (string) - Model deployment mode. Supported modes are batch for SageMaker Batch Transform Job and endpoint for SageMaker Endpoint.
- BatchInputDataPath (string) - (REQUIRED if mode=batch) S3 URI of dataset against which predictions are generated. Data must be store in csv format, without header and with same columns order of training dataset.
- BatchOutputDataPath (string) - (REQUIRED if mode=batch) S3 URI to where to store batch predictions.

Repo structure

app.py entrypoint
stepfunctions_automl_workflow/lambdas/ AWS Lambda source scripts
stepfunctions_automl_workflow/utils/ utils functions used across for stack generation
stepfunctions_automl_workflow/stack.py CDK stack definition
notebooks/ Jupyter Notebooks to familiarise with the artifacts
notebooks/input/ Input examples to be fed in State Machines

Clean-up

WARNING: While you'll still be able to keep SageMaker artifacts, the AWS Step Functions State Machines will be deleted along with their execution history. Clean-up all resources with cdk destroy.

CDK cheatsheet

cdk ls list all stacks in the app
cdk synth emits the synthesized CloudFormation template
cdk deploy deploy this stack to your default AWS account/region
cdk diff compare deployed stack with current state
cdk docs open CDK documentation

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
img		img
notebooks		notebooks
stepfunctions_automl_workflow		stepfunctions_automl_workflow
tests		tests
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
app.py		app.py
cdk.json		cdk.json
config.json		config.json
permissions_policy.json		permissions_policy.json
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
source.bat		source.bat
trust_policy.json		trust_policy.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Manage AutoML workflows with AWS StepFunctions and AutoGluon on Amazon SageMaker

Outline

Installation

Prerequisites

Step-by-step setup

Notebook Walkthrough (SUGGESTED)

State Machines Input Documentation

Repo structure

Clean-up

CDK cheatsheet

Enjoy!

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

aws-samples/aws-stepfunctions-automl-workflow

Folders and files

Latest commit

History

Repository files navigation

Manage AutoML workflows with AWS StepFunctions and AutoGluon on Amazon SageMaker

Outline

Installation

Prerequisites

Step-by-step setup

Notebook Walkthrough (SUGGESTED)

State Machines Input Documentation

Repo structure

Clean-up

CDK cheatsheet

Enjoy!

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages