In this repository, we present a deployement-ready AWS stack which uses AWS Step Functions to orchestrate AutoML workflows using AutoGluon on Amazon SageMaker.
A complete description can be found in the corresponding blog post.
| Main State Machine | Training State Machine | Deployment State Machine | 
|---|---|---|
|  |  |  | 
- Node.js 16.13.1
- Python 3.7.10
- 
Clone this repository to your cloud environment of choice (Cloud9, EC2 instance, local aws environemnt, ...) 
- 
Create IAM role needed to deploy the stack (skip to 3. if you already have a role with sufficient permissions and trust relationship). 
- 
Using AWS CLI - Configure AWS CLI profile that you would like to use, if not configured yet with aws configureand follow the instructions
- Create a new IAM role which can be used by Cloud Formation with aws iam create-role --role-name {YOUR_ROLE_NAME} --assume-role-policy-document file://trust_policy.json
- Attach permissions policy to the new role aws iam put-role-policy --role-name {YOUR_ROLE_NAME} --policy-name {YOUR_POLICY_NAME} --policy-document file://permissions_policy.json
 
- Configure AWS CLI profile that you would like to use, if not configured yet with 
- 
Alternatevily, you can create the role using AWS IAM Management Console. Once created, make sure to update Trust Relationship with trust_policy.jsonand attach a customer Permissions Policy based onpermissions_policy.json
- 
Create a new python virtual environment python3 -m venv .venv
- 
Activate the environment source .venv/bin/activate
- 
Install AWS CDK npm install -g aws-cdk@2.8.0
- 
Install requirements pip install -r requirements.txt
- 
Bootstrap AWS CDK for your aws account cdk bootstrap aws://{AWS_ACCOUNT_ID}/{REGION}. If your account has been bootstrapped already withcdk@1.X, you may need to manually deleteCDKToolkitstack from AWS CloudFormation console to avoid compatibility issues withcdk@2.X. Once de-bootstrapped, proceed by re-bootstrapping.
- 
Deploy the stack with cdk deploy -r {NEW_ROLE_ARN}
Once the stack is deployed, you can familiarize with the resources using the tutorial notebooks/AutoML Walkthrough.ipynb.
Action flows defined using AWS Step Functions are called State Machine.
Each machine has parameters that can be defined at runtime (i.e. execution-specific) which are specified through an input json object. Some exemples of input parameters are presented in notebooks/input/. Despite being meant to be used during the notebook tutorial, you can also copy/paste them directly into the AWS Console.
Request Syntax
{
    "Parameters": {
      "Flow": {
        "Train": true|false,
        "Evaluate": true|false,
        "Deploy": true|false
      },
      "PretrainedModel":{
          "Name": "string"
      },
      "Train": {
        "TrainDataPath": "string",
        "TestDataPath": "string",
        "TrainingOutput": "string",
        "InstanceCount": int,
        "InstanceType": "string",
        "FitArgs": "string"",
        "InitArgs": "string"
      },
      "Evaluation": {
        "Threshold": flaot,
        "Metric": "string"
      },
      "Deploy": {
        "InstanceCount": int,
        "InstanceType": "string",
        "Mode": "endpoint"|"batch",
        "BatchInputDataPath": "string",
        "BatchOutputDataPath": "string"
      }
    }
}
Parameters
- Flow
- Train (bool) - (REQUIRED) indicates if a new AutoGluon SageMaker Training Job is required. Set to falseto deploy a pretrained model.
- Evaluation (bool) - set to trueif evaluation is required. If selected, a AWS Lambda will retreive model performances on test set and evaluate them agains user-defined threshold. If model performances are not satisfactory, deployment is skipped.
- Deploy (bool) - (REQUIRED) indicates if model has to be deployed.
 
- Train (bool) - (REQUIRED) indicates if a new AutoGluon SageMaker Training Job is required. Set to 
- PretrainedModel
- Name (string) - indicates which pre-trained model to be used for deployment. Models are referenced through their SageMaker Model Name. If Flow.Train = truethis field is ignored, otherwise it's required.
 
- Name (string) - indicates which pre-trained model to be used for deployment. Models are referenced through their SageMaker Model Name. If 
- Train (REQUIRED if Flow.Train = true)- TrainDataPath (string) - S3 URI where train csvis stored. Header and target variable are required. AutoGluon will perform holdout split for validation automatically.
- TestDataPath (string) - S3 URI where test csvis stored. Header and target variable are required. Dataset is used to evaluate model performances on samples not seen during training.
- TrainingOutput (string) - S3 URI where to store model artifacts at the end of training job.
- InstanceCount (int) - Number of instances to be used for training.
- InstanceType (string) - AWS instance type to be used for training (e.g. ml.m4.2xlarge). See full list here.
- FitArgs (string) - double JSON-encoded dictionary containing parameters to be used during model .fit(). List of available parameters here. Dictionary needs to be encoded twice because it will be decoded both by State Machine and SageMaker Training Job.
- InitArgs (string) - double JSON-encoded dictionary containing parameters to be used when model is initiated TabularPredictor(). List of available parameters here. Dictionary needs to be encoded twice because it will be decoded both by State Machine and SageMaker Training Job. Common parameters arelabel,problem_typeandeval_metric.
 
- TrainDataPath (string) - S3 URI where train 
- Evaluation (REQUIRED if Flow.Evaluate = true)- Threshold (float) - Metric threshold to consider model performance satisfactory. All metrics are maximized (e.g. losses are repesented as negative losses).
- Metric (string) - Metric name used for evaluation. Accepted metrics correspond to avaiable eval_metricfrom AutoGluon.
 
- Deploy (REQUIRED if Flow.Deploy = true)- InstanceCount (int) - Number of instances to be used for training.
- InstanceType (string) - AWS instance type to be used for training (e.g. ml.m4.2xlarge). See full list here.
- Mode (string) - Model deployment mode. Supported modes are batchfor SageMaker Batch Transform Job andendpointfor SageMaker Endpoint.
- BatchInputDataPath (string) - (REQUIRED if mode=batch) S3 URI of dataset against which predictions are generated. Data must be store incsvformat, without header and with same columns order of training dataset.
- BatchOutputDataPath (string) - (REQUIRED if mode=batch) S3 URI to where to store batch predictions.
 
- app.pyentrypoint
- stepfunctions_automl_workflow/lambdas/AWS Lambda source scripts
- stepfunctions_automl_workflow/utils/utils functions used across for stack generation
- stepfunctions_automl_workflow/stack.pyCDK stack definition
- notebooks/Jupyter Notebooks to familiarise with the artifacts
- notebooks/input/Input examples to be fed in State Machines
WARNING: While you'll still be able to keep SageMaker artifacts, the AWS Step Functions State Machines will be deleted along with their execution history.
Clean-up all resources with cdk destroy.
- cdk lslist all stacks in the app
- cdk synthemits the synthesized CloudFormation template
- cdk deploydeploy this stack to your default AWS account/region
- cdk diffcompare deployed stack with current state
- cdk docsopen CDK documentation