Skip to content

Testset generator not preserving persona and scenario metadata in generated samples #2385

@anistark

Description

@anistark

When using TestsetGenerator with multiple personas, there's no reliable way to track which persona was used to generate each sample in the resulting testset. The persona information is lost after generation, making it impossible to:

  • Correlate generated questions with their source personas
  • Verify persona distribution in the generated dataset
  • Validate that questions are relevant to their intended personas

Users currently resort to workarounds like string prefixing or processing one persona at a time, which is inefficient and degrades quality. A recommended approach to preserve and access persona metadata in generated samples is needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmodule-testsetgenModule testset generation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions