Skip to content

This workshop teaches systematic approaches to evaluating Generative AI workloads for production use. You'll learn to build evaluation frameworks that go beyond basic metrics to ensure reliable model performance while optimizing cost and performance.

License

Notifications You must be signed in to change notification settings

aws-samples/sample-gen-ai-evaluations-workshop

Workshop Structure

Generative AI Evaluations Workshop

This workshop teaches systematic approaches to evaluating Generative AI workloads for production use. You'll learn to build evaluation frameworks that go beyond basic metrics to ensure reliable model performance while optimizing cost and performance.

Click here for a slide deck which covers the basics of evaluations and includes an overview of this workshop.

How to use this repository

We strongly recommend going in order through the first 3 modules. These cover the core of generative AI evaluations which will be critical in all workloads. After that, please feel free to select any of the workload and framework specific modules in any order, according to what is most relevant to you.

What You'll Learn

Core Modules - Do all of these in order!

  • 01 Operational Metrics: evaluate how your workload is running in terms of cost and performance.
  • 02 Quality Metrics: evaluate and tune the quality of your results.
  • 03 Agentic Metrics: evaluate your agents and use agents for evaluation.

Optional Modules - Do any of these in any order!

  • 04 Workload Specific Metrics

    • Intelligent Document Processing
    • Guardrails
    • Basic RAG
    • Multi-modal RAG
    • Synthetic Data Generation (Coming soon!)
    • Speech 2 Speech (Coming soon!)
  • 05 Framework and Tool Specific Implementations

    • PromptFoo
    • AgentCore
    • BrainTrust (Coming soon!)
    • Bedrock Evaluations (Coming soon!)

Prerequisites

  • AWS account with Amazon Bedrock enabled
  • Basic Python and ML familiarity
  • No security expertise required

Getting Started

  1. Clone the repository
  2. Configure AWS credentials
  3. Start with module 01-Operational-Metrics
  4. Complete the rest of the core modules, 02 and 03
  5. Review the workload and framework specific modules, choose any in any order.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

This workshop teaches systematic approaches to evaluating Generative AI workloads for production use. You'll learn to build evaluation frameworks that go beyond basic metrics to ensure reliable model performance while optimizing cost and performance.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Contributors 8