When using TestsetGenerator with multiple personas, there's no reliable way to track which persona was used to generate each sample in the resulting testset. The persona information is lost after generation, making it impossible to:
- Correlate generated questions with their source personas
- Verify persona distribution in the generated dataset
- Validate that questions are relevant to their intended personas
Users currently resort to workarounds like string prefixing or processing one persona at a time, which is inefficient and degrades quality. A recommended approach to preserve and access persona metadata in generated samples is needed.