Skip to content

Fix non-string types like bool / int being saved as string in outputs.jsonl#284

Open
junya-takayama wants to merge 3 commits intomainfrom
fix_truncate_base64
Open

Fix non-string types like bool / int being saved as string in outputs.jsonl#284
junya-takayama wants to merge 3 commits intomainfrom
fix_truncate_base64

Conversation

@junya-takayama
Copy link
Copy Markdown
Collaborator

After the changes introduced in #233, values such as int, float, bool, and None have been converted to string when saved to outputs.jsonl.
This is problematic for flexeval_file, since the stored values no longer preserve their original types.
This PR fixes that behavior so the original types are preserved.

@junya-takayama
Copy link
Copy Markdown
Collaborator Author

junya-takayama commented Apr 3, 2026

trial

toydata.jsonl

{"prompt": "Hello, are you human?", "is_acceptable": false }
{"prompt": "Hello, are you an AI?", "is_acceptable": true }

toytask.jsonnet

{
  class_path: 'ChatResponse',
  init_args: {
    eval_dataset: {
      class_path: 'JsonlChatDataset',
      init_args: {
        path: 'toydata.jsonl',
        input_template: '{{ prompt }}',
      },
    },
    batch_size: 2,
    metrics: [
      { class_path: 'OutputLengthStats' },
    ],
  },
}

command

poetry run flexeval_lm --eval_setup toytask.jsonnet --save_dir ./tmp --language_model OpenAIChatAPI --language_model.model gpt-oss-120b --language_model.api_headers.base_url ***

outputs.jsonl: ✅ The value of is_acceptable is stored as a boolean.

{"lm_output": "Hello! I’m not a human—I’m an AI language model created to help answer your questions and chat with you. How can I assist you today?", "finish_reason": "stop", "extra_info": {"prompt": "Hello, are you human?", "is_acceptable": false, "messages": [{"role": "user", "content": "Hello, are you human?"}]}, "references": [], "output_length": 132, "reasoning_text": "The user asks \"Hello, are you human?\" The assistant should respond per policy. The user question is simple. Should answer: No, I'm an AI language model. Friendly."}
{"lm_output": "Hello! Yes, I’m an AI language model here to help with your questions and requests. How can I assist you today?", "finish_reason": "stop", "extra_info": {"prompt": "Hello, are you an AI?", "is_acceptable": true, "messages": [{"role": "user", "content": "Hello, are you an AI?"}]}, "references": [], "output_length": 111, "reasoning_text": "The user asks \"Hello, are you an AI?\" We should respond politely, acknowledge we are an AI language model. The system prompt says we must comply with policies. This is trivial."}

@junya-takayama junya-takayama marked this pull request as ready for review April 3, 2026 04:52
@junya-takayama junya-takayama requested a review from moskomule April 3, 2026 04:58
Comment on lines +37 to +38
assert json.loads(_json_dumps(TestDataClass("example", True))) == {"field1": "example", "field2": True}
assert json.loads(_json_dumps(TestDataClass("example", None))) == {"field1": "example", "field2": None}
Copy link
Copy Markdown
Collaborator Author

@junya-takayama junya-takayama Apr 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because errors occur due to the difference between Python’s bool/None literals (True / False / None ) and JSON boolean/null literals ( true / false / none ), I’m replacing literal_eval() with json.loads().

Copy link
Copy Markdown
Member

@moskomule moskomule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I commented a minor thing.

return type(o)(_truncate_base64(item) for item in o)
if isinstance(o, dict):
return {k: _truncate_base64(v) for k, v in o.items()}
if isinstance(o, int | float | bool | type(None)):
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L16 uses pipe int | float... while L 12 uses tuple (list, tuple, ...).
Could you make it consistent?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

L16 uses pipe int | float... while L 12 uses tuple (list, tuple, ...).
Could you make it consistent?

I fixed it.
f12bf17

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants