Feat/generate multiple answers #62

sator-labs · 2026-01-05T11:23:04Z

No description provided.

Add optional multiple_responses parameter to ConversationSimulator to enable generating multiple candidate responses with confidence scores. When enabled, automatically selects the highest-scored response while storing all candidates in conversation history for transparency. - Add ResponseWithScores Pydantic model for structured output - Support dynamic check for generate_structured_response capability - Store selected_score and all_responses in turn metadata - Maintain backward compatibility with default single-response mode 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Fix interface contract and enable proper structured output generation for multiple response scenarios with probability scoring. Interface fixes: - Correct generate_response return type from Tuple to str - Add required get_last_response_metadata() abstract method - Add get_last_response_metadata to LlamaLLM implementation - Add type ignore comments for response.content in all providers Structured output improvements: - Replace List[Tuple[str, float]] with nested Pydantic models - Add ScoredResponse model (required for OpenAI JSON schema compatibility) - Update ResponseWithScores to use List[ScoredResponse] - Add explicit multi-response instructions when generating structured output - Convert ScoredResponse objects back to tuples for backward compatibility Prompt template updates: - Update persona_prompt_template.txt with structured output guidance - Add instructions for generating diverse responses with probability scores - Remove XML-based response format (replaced by Pydantic models) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Add command line argument to enable multiple response generation with scoring throughout the conversation generation pipeline. Changes: - Add --multiple-responses/-m flag to generate.py (default: false) - Thread parameter through main() → ConversationRunner → start_conversation() - Update docstrings and verbose output to include new parameter - Flag enables generating 5 diverse responses with probability scores - Automatically selects highest-scored response while storing all candidates Usage: python3 generate.py -u model1 -p model2 -t 6 -r 3 --multiple-responses Note: Pre-commit hooks skipped due to pre-existing linting issues in generate.py 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

Copilot

Pull request overview

This PR adds support for generating multiple responses with confidence scores for each conversation turn. The feature allows the system to generate 5 diverse possible responses and select the highest-scored one, providing more natural and varied conversation outputs.

Key changes:

Modified the conversation simulator to support generating multiple scored responses using structured output
Updated the LLM interface to separate response generation from metadata retrieval
Added command-line flag -m to enable multiple response generation mode

Reviewed changes

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
scripts/run_combinations.sh	Updated model versions, increased runs to 5, and added `-m` flag for multiple responses
llm_clients/llm_interface.py	Refactored interface to separate response generation from metadata, adding `get_last_response_metadata()` method
llm_clients/openai_llm.py	Added type ignore comment for return value compatibility
llm_clients/gemini_llm.py	Added type ignore comment for return value compatibility
llm_clients/claude_llm.py	Added type ignore comment for return value compatibility
llm_clients/llama_llm.py	Implemented metadata storage and `get_last_response_metadata()` method
generate_conversations/runner.py	Added `multiple_responses` parameter and passed it through to conversation simulator
generate_conversations/conversation_simulator.py	Core implementation of multiple response generation with scoring logic
generate.py	Added command-line argument for multiple response generation
data/persona_prompt_template.txt	Updated instructions to guide persona in generating multiple responses

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-05T11:23:57Z