Getting set up • Tutorials • Core Concepts
Evaluators help you to measure the attributes of LLM-generated text through the lens of learning science.
We build learning-science backed LLM as judge systems that can be directly integrated to your product or evaluation stack.
Use cases include:
- Feature optimization: Use fine-grained literacy evaluation to sharpen and consistently deliver a feature’s AI-generated content so it aligns with pedagogy and your goals.
- Maintaining performance: Ensure content is generated as expected by using the evaluators as product analytics for your LLM output.
- Model selection: Make a confident decision about which model is right for your product by testing the output of models you’re considering.
Evaluators and the supporting datasets are built in collaboration with leading literacy experts from Student Achievement Partners and the Achievement Network.
| Path | Description |
|---|---|
evals |
Evaluators code and prompts |
datasets |
Expert annotated datasets used to create and validate evaluators |
LICENSE |
Open source license details |
Check out the full docs for complete setup instructions and usage examples.
To try or use the evaluators, clone the repository and follow the instructions below.
If you’d like to download or access our evaluators and datasets directly, follow the links below.
- Evaluators literacy package
- Datasets
We rely on the python interpreter to power the evaluators. All examples and tutorials are provided as python code snippets.
Setup on Mac/Linux
You’ll need Python 3.10 or newer. To verify your version of python, run the following code in the terminal:
python3 --versionCreating an isolated environment is a best practice that prevents conflicts between Python packages used in this project and others on your system.
python3 -m venv .venv
source .venv/bin/activateRemember to activate the virtual environment for each new shell session when working with Evaluators.
The list of required packages is provided in the requirements.txt file.
pip install -r evals/requirements.txtWe are using both OpenAI and Google Gemini for different evaluators. You need API keys from both platforms:
- OpenAI: https://platform.openai.com/
- Gemini: https://aistudio.google.com/
Set the key(s) as environment variables in your shell session:
export OPENAI_API_KEY="sk-your-key-here"
export GOOGLE_API_KEY="your-key-here"Setup on Windows
You’ll need Python 3.10 or newer. To verify your version of python, run the following code in the terminal:
python --versionOpen a Command Prompt and run:
python -m venv .venv
.venv\Scripts\activateOr in PowerShell:
python -m venv .venv
.venv\Scripts\Activate.ps1Remember to activate the virtual environment for each new shell session when working with Evaluators.
pip install -r evals/requirements.txtGet your API keys from:
- OpenAI: https://platform.openai.com/
- Gemini: https://aistudio.google.com/
Set the key(s) as environment variables:
In Command Prompt:
set OPENAI_API_KEY=sk-your-key-here
set GOOGLE_API_KEY=your-key-hereIn PowerShell:
$env:OPENAI_API_KEY="sk-your-key-here"
$env:GOOGLE_API_KEY="your-key-here"You are now ready to run the evaluator examples. We recommend using a Jupyter Notebook for interactive exploration.
- Start Jupyter Notebooks Lab:
jupyter labJupyter will open in your web browser (usually at http://localhost:8888).
- Browse into the
evalsfolder, then double click on the evaluator you want to try. - You can now copy the text you want to evaluate into the last code cell of the notebook to run an evaluator on your text sample.
If you prefer using an IDE with Python and Jupyter notebook support, such as VSCode with Microsoft's Python and Jupyter extensions, please refer to Microsoft's instructions for their installation and configuration.)
We want to hear from you. For questions or feedback, please open an issue or reach out to us at support@learningcommons.org
Learn more about our Evaluators or join our private beta to access:
- Full set of Evaluators focused on Literacy and Student Feedback
- Early access to new features, including our API and dashboards
- Personalized support from the Evaluators team
Contact us here.
If you believe you have found a security issue, please responsibly disclose by contacting us at security@learningcommons.org.
The resources provided in this repository are made available "as-is", without warranties or guarantees of any kind. They may contain inaccuracies, limitations, or other constraints depending on the context of use. Use of these resources is subject to our Terms of Use.
By accessing or using these resources, you acknowledge that:
- You are responsible for evaluating their suitability for your specific use case.
- Learning Commons makes no representations about the accuracy, completeness, or fitness of these resources for any particular purpose.
- Any use of the materials is at your own risk, and Learning Commons is not liable for any direct or indirect consequences that may result.
Please refer to each resource’s README, license, and associated docs for any additional limitations, attribution requirements, or guidance specific to that resource.
