-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Description
** Please make sure you read the contribution guide and file the issues in the right place. **
Contribution guide.
Describe the bug
LocalEvalSetResultsManager.save_eval_set_result() writes JSON results as a JSON-encoded string (double-encoded), so the file contents look like "{\"key\": \"value\"}" instead of a JSON object.
To Reproduce
based on https://github.com/google/adk-python/blob/v1.21.0/tests/integration/test_single_agent.py#L19-L25
from google.adk.evaluation.agent_evaluator import AgentEvaluator
from google.adk.evaluation.eval_config import get_eval_metrics_from_config
from google.adk.evaluation.local_eval_set_results_manager import (
LocalEvalSetResultsManager,
)
from google.adk.evaluation.simulation.user_simulator_provider import (
UserSimulatorProvider,
)
import pytest
@pytest.mark.asyncio
async def test_eval_agent(tmp_path):
test_file = (
"tests/integration/fixture/home_automation_agent/simple_test.test.json"
)
eval_config = AgentEvaluator.find_config_for_test_file(test_file)
eval_set = AgentEvaluator._load_eval_set_from_file(
test_file, eval_config, initial_session={}
)
eval_metrics = get_eval_metrics_from_config(eval_config)
user_simulator_provider = UserSimulatorProvider(
user_simulator_config=eval_config.user_simulator_config
)
agent_for_eval = await AgentEvaluator._get_agent_for_eval(
module_name="tests.integration.fixture.home_automation_agent",
agent_name=None,
)
eval_results_by_eval_id = (
await AgentEvaluator._get_eval_results_by_eval_id(
agent_for_eval=agent_for_eval,
eval_set=eval_set,
eval_metrics=eval_metrics,
num_runs=4,
user_simulator_provider=user_simulator_provider,
)
)
results_manager = LocalEvalSetResultsManager(agents_dir=str(tmp_path))
for eval_case_results in eval_results_by_eval_id.values():
results_manager.save_eval_set_result(
app_name="test_app",
eval_set_id=eval_set.eval_set_id,
eval_case_results=eval_case_results,
)
failures = []
for eval_case_results in eval_results_by_eval_id.values():
eval_metric_results = (
AgentEvaluator._get_eval_metric_results_with_invocation(
eval_case_results
)
)
failures_per_eval_case = AgentEvaluator._process_metrics_and_get_failures(
eval_metric_results=eval_metric_results,
print_detailed_results=True,
agent_module=None,
)
failures.extend(failures_per_eval_case)
failure_message = "Following are all the test failures.\n" + "\n".join(
failures
)
assert not failures, failure_message"{\"eval_set_result_id\":\"test_app_b305bd06-38c5-4796-b9c7-d9c7454338b9_1766325534.0213041\",Expected behavior
The saved file should contain a JSON object (e.g. {"eval_set_result_id": "...", ... }), not a JSON string.
Screenshots
N/A
Desktop (please complete the following information):
- OS: macOS
- Python version(python -V): Python 3.13.8
- ADK version(pip show google-adk): v1.21.0
Model Information:
- Are you using LiteLLM: No
- Which model is being used: (gemini-2.0-flash-001)
Additional context
Likely caused by double-encoding: model_dump_json() returns a JSON string which is then passed through json.dumps() again. Using model_dump() + json.dump() (or writing model_dump_json() directly) would avoid double encoding.
| # Convert to json and write to file. | |
| eval_set_result_json = eval_set_result.model_dump_json() | |
| eval_set_result_file_path = os.path.join( | |
| app_eval_history_dir, | |
| eval_set_result.eval_set_result_name + _EVAL_SET_RESULT_FILE_EXTENSION, | |
| ) | |
| logger.info("Writing eval result to file: %s", eval_set_result_file_path) | |
| with open(eval_set_result_file_path, "w", encoding="utf-8") as f: | |
| f.write(json.dumps(eval_set_result_json, indent=2)) |