MLOps with MLFlow on AWS for AI-based Scientific Workflows

1.1 Purpose of the Lab

In this lab, you will learn how to build, evaluate, and compare end-to-end Active Learning (AL) pipelines using AWS and MLFlow. The lab focuses on developing and testing classification models that evolve through multiple rounds of data selection, training, and evaluation. You will perform hands-on activities that demonstrate three Active Learning strategies such as uncertainty sampling, and diversity sampling and measure their effectiveness using model evaluation on a fixed test set.

By the end of this lab, you will understand:

How to implement Active Learning workflows.
How to train and evaluate models with real datasets.
How to track and compare model performance using MLflow.
How to run ML experiments efficiently on AWS EC2 instances.
How to visualize the model resutls using MLFlow UI.

1.2.1 Prerequisites

To follow along and get the most out of this lab, you should have:

Basic understanding of machine learning and classification tasks.
Basic understading about python programming.
Experience working in AWS EC2 environments (not mandatory, but helpful).

Additional requirements:

AWS account with permissions to launch EC2 instances (preferably with GPU access).

1.2.2 References to Guide Lab Work

Here are useful links to learn more about the tools and concepts in this lab:

1.3 Overview

In this lab, we implement an Active Learning-based NLP classification workflow to optimize the labeling process. Traditional supervised learning requires large amounts of labeled data, which is costly and time-consuming to collect. Active Learning allows the model to intelligently select the most informative data points to label, thereby improving performance with fewer labeled examples.

We compare three model training strategies:

Baseline: Trained on a fixed, randomly sampled dataset.
Uncertainty Sampling: Selects data where model confidence is low.
Diversity Sampling: Selects data that best represents the entire feature space.

Each strategy is evaluated on a fixed test set (1,000 samples) and shares a common unlabeled pool (45,000 samples).

You will log metrics and artifacts from each round of training using MLflow and visualize results via its web interface.

You will aslo use AWS EC2 GPU instance to train and run experiments.

Made by Vani Seth for Mizzou Cloud DevOps Portal - University of Missouri, Columbia

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
Images		Images
html		html
.DS_Store		.DS_Store
AL Hello World.ipynb		AL Hello World.ipynb
AL Lab.ipynb		AL Lab.ipynb
Assessment questions.txt		Assessment questions.txt
Chapter 1.md		Chapter 1.md
Chapter 2.md		Chapter 2.md
Chapter 3.md		Chapter 3.md
Chapter 4.md		Chapter 4.md
LICENSE.md		LICENSE.md
Pre-survey questions.txt		Pre-survey questions.txt
README.md		README.md
Sentiment classification use case.md		Sentiment classification use case.md
Unconfirmed 987769.crdownload		Unconfirmed 987769.crdownload

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

MLOps with MLFlow on AWS for AI-based Scientific Workflows

1.1 Purpose of the Lab

1.2.1 Prerequisites

1.2.2 References to Guide Lab Work

1.3 Overview

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

Uh oh!

License

Uh oh!

vaniseth/Active-Learning-with-MLflow

Folders and files

Latest commit

History

Repository files navigation

MLOps with MLFlow on AWS for AI-based Scientific Workflows

1.1 Purpose of the Lab

1.2.1 Prerequisites

1.2.2 References to Guide Lab Work

1.3 Overview

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages