Skip to content

Evaluation for AI outputs against trusted educational rubrics. Measure and improve content quality with research-backed rubrics — ensuring rigor, reliability, and alignment to classroom needs.

License

Notifications You must be signed in to change notification settings

learning-commons-org/evaluators

evaluator_banner

Getting set upTutorialsCore Concepts

Evaluators

Evaluators help you to measure the attributes of LLM-generated text through the lens of learning science.

We build learning-science backed LLM as judge systems that can be directly integrated to your product or evaluation stack.

Use cases include:

  • Feature optimization: Use fine-grained literacy evaluation to sharpen and consistently deliver a feature’s AI-generated content so it aligns with pedagogy and your goals.
  • Maintaining performance: Ensure content is generated as expected by using the evaluators as product analytics for your LLM output.
  • Model selection: Make a confident decision about which model is right for your product by testing the output of models you’re considering.

Evaluators and the supporting datasets are built in collaboration with leading literacy experts from Student Achievement Partners and the Achievement Network.

Repository Contents

Path Description
evals Evaluators code and prompts
datasets Expert annotated datasets used to create and validate evaluators
LICENSE Open source license details

Check out the full docs for complete setup instructions and usage examples.

Quick Start

To try or use the evaluators, clone the repository and follow the instructions below.

If you’d like to download or access our evaluators and datasets directly, follow the links below.

Requirements

We rely on the python interpreter to power the evaluators. All examples and tutorials are provided as python code snippets.


Setup on Mac/Linux

Setup on Mac/Linux

You’ll need Python 3.10 or newer. To verify your version of python, run the following code in the terminal:

python3 --version

1. Create a virtual environment

Creating an isolated environment is a best practice that prevents conflicts between Python packages used in this project and others on your system.

python3 -m venv .venv
source .venv/bin/activate

Remember to activate the virtual environment for each new shell session when working with Evaluators.

2. Install dependencies

The list of required packages is provided in the requirements.txt file.

pip install -r evals/requirements.txt

3. Set your API keys

We are using both OpenAI and Google Gemini for different evaluators. You need API keys from both platforms:

Set the key(s) as environment variables in your shell session:

export OPENAI_API_KEY="sk-your-key-here"
export GOOGLE_API_KEY="your-key-here"
Setup on Windows

Setup on Windows

You’ll need Python 3.10 or newer. To verify your version of python, run the following code in the terminal:

python --version

1. Create a virtual environment

Open a Command Prompt and run:

python -m venv .venv
.venv\Scripts\activate

Or in PowerShell:

python -m venv .venv
.venv\Scripts\Activate.ps1

Remember to activate the virtual environment for each new shell session when working with Evaluators.

2. Install dependencies

pip install -r evals/requirements.txt

3. Set your API keys

Get your API keys from:

Set the key(s) as environment variables:

In Command Prompt:

set OPENAI_API_KEY=sk-your-key-here
set GOOGLE_API_KEY=your-key-here

In PowerShell:

$env:OPENAI_API_KEY="sk-your-key-here"
$env:GOOGLE_API_KEY="your-key-here"

Run the Evaluator Code

You are now ready to run the evaluator examples. We recommend using a Jupyter Notebook for interactive exploration.

  1. Start Jupyter Notebooks Lab:
jupyter lab

Jupyter will open in your web browser (usually at http://localhost:8888).

  1. Browse into the evals folder, then double click on the evaluator you want to try.
  2. You can now copy the text you want to evaluate into the last code cell of the notebook to run an evaluator on your text sample.

If you prefer using an IDE with Python and Jupyter notebook support, such as VSCode with Microsoft's Python and Jupyter extensions, please refer to Microsoft's instructions for their installation and configuration.)

Support & Feedback

We want to hear from you. For questions or feedback, please open an issue or reach out to us at support@learningcommons.org

Partner with us

Learn more about our Evaluators or join our private beta to access:

  • Full set of Evaluators focused on Literacy and Student Feedback
  • Early access to new features, including our API and dashboards
  • Personalized support from the Evaluators team

Contact us here.

Reporting Security Issues

If you believe you have found a security issue, please responsibly disclose by contacting us at security@learningcommons.org.

Disclaimer

The resources provided in this repository are made available "as-is", without warranties or guarantees of any kind. They may contain inaccuracies, limitations, or other constraints depending on the context of use. Use of these resources is subject to our Terms of Use.

By accessing or using these resources, you acknowledge that:

  • You are responsible for evaluating their suitability for your specific use case.
  • Learning Commons makes no representations about the accuracy, completeness, or fitness of these resources for any particular purpose.
  • Any use of the materials is at your own risk, and Learning Commons is not liable for any direct or indirect consequences that may result.

Please refer to each resource’s README, license, and associated docs for any additional limitations, attribution requirements, or guidance specific to that resource.

About

Evaluation for AI outputs against trusted educational rubrics. Measure and improve content quality with research-backed rubrics — ensuring rigor, reliability, and alignment to classroom needs.

Topics

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks