Generative AI Evaluations Workshop

This workshop teaches systematic approaches to evaluating Generative AI workloads for production use. You'll learn to build evaluation frameworks that go beyond basic metrics to ensure reliable model performance while optimizing cost and performance.

Click here for a slide deck which covers the basics of evaluations and includes an overview of this workshop.

How to use this repository

We strongly recommend going in order through the first 3 modules. These cover the core of generative AI evaluations which will be critical in all workloads. After that, please feel free to select any of the workload and framework specific modules in any order, according to what is most relevant to you.

What You'll Learn

Core Modules - Do all of these in order!

01 Operational Metrics: evaluate how your workload is running in terms of cost and performance.
02 Quality Metrics: evaluate and tune the quality of your results.
03 Agentic Metrics: evaluate your agents and use agents for evaluation.

Optional Modules - Do any of these in any order!

04 Workload Specific Metrics
- Intelligent Document Processing
- Guardrails
- Basic RAG
- Multi-modal RAG
- Synthetic Data Generation (Coming soon!)
- Speech 2 Speech (Coming soon!)
05 Framework and Tool Specific Implementations
- PromptFoo
- AgentCore
- BrainTrust (Coming soon!)
- Bedrock Evaluations (Coming soon!)

Prerequisites

AWS account with Amazon Bedrock enabled
Basic Python and ML familiarity
No security expertise required

Getting Started

Clone the repository
Configure AWS credentials
Start with module 01-Operational-Metrics
Complete the rest of the core modules, 02 and 03
Review the workload and framework specific modules, choose any in any order.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
01-operational-metrics		01-operational-metrics
02-quality-metrics		02-quality-metrics
03-agentic-metrics		03-agentic-metrics
04-workload-specific-evaluations		04-workload-specific-evaluations
05-framework-specific-evaluations		05-framework-specific-evaluations
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
evals workshop.png		evals workshop.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Generative AI Evaluations Workshop

How to use this repository

What You'll Learn

Core Modules - Do all of these in order!

Optional Modules - Do any of these in any order!

Prerequisites

Getting Started

Security

License

About

Uh oh!

Uh oh!

Contributors 8

Uh oh!

Languages

License

aws-samples/sample-gen-ai-evaluations-workshop

Folders and files

Latest commit

History

Repository files navigation

Generative AI Evaluations Workshop

How to use this repository

What You'll Learn

Core Modules - Do all of these in order!

Optional Modules - Do any of these in any order!

Prerequisites

Getting Started

Security

License

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Contributors 8

Uh oh!

Languages