Fix non-string types like `bool` / `int` being saved as `string` in outputs.jsonl by junya-takayama · Pull Request #284 · sbintuitions/flexeval

junya-takayama · 2026-04-02T09:49:33Z

After the changes introduced in #233, values such as int, float, bool, and None have been converted to string when saved to outputs.jsonl.
This is problematic for flexeval_file, since the stored values no longer preserve their original types.
This PR fixes that behavior so the original types are preserved.

junya-takayama · 2026-04-03T04:51:18Z

trial

toydata.jsonl

{"prompt": "Hello, are you human?", "is_acceptable": false }
{"prompt": "Hello, are you an AI?", "is_acceptable": true }

toytask.jsonnet

{
  class_path: 'ChatResponse',
  init_args: {
    eval_dataset: {
      class_path: 'JsonlChatDataset',
      init_args: {
        path: 'toydata.jsonl',
        input_template: '{{ prompt }}',
      },
    },
    batch_size: 2,
    metrics: [
      { class_path: 'OutputLengthStats' },
    ],
  },
}

command

poetry run flexeval_lm --eval_setup toytask.jsonnet --save_dir ./tmp --language_model OpenAIChatAPI --language_model.model gpt-oss-120b --language_model.api_headers.base_url ***

outputs.jsonl: ✅ The value of is_acceptable is stored as a boolean.

{"lm_output": "Hello! I’m not a human—I’m an AI language model created to help answer your questions and chat with you. How can I assist you today?", "finish_reason": "stop", "extra_info": {"prompt": "Hello, are you human?", "is_acceptable": false, "messages": [{"role": "user", "content": "Hello, are you human?"}]}, "references": [], "output_length": 132, "reasoning_text": "The user asks \"Hello, are you human?\" The assistant should respond per policy. The user question is simple. Should answer: No, I'm an AI language model. Friendly."}
{"lm_output": "Hello! Yes, I’m an AI language model here to help with your questions and requests. How can I assist you today?", "finish_reason": "stop", "extra_info": {"prompt": "Hello, are you an AI?", "is_acceptable": true, "messages": [{"role": "user", "content": "Hello, are you an AI?"}]}, "references": [], "output_length": 111, "reasoning_text": "The user asks \"Hello, are you an AI?\" We should respond politely, acknowledge we are an AI language model. The system prompt says we must comply with policies. This is trivial."}

junya-takayama · 2026-04-03T05:47:26Z

tests/core/utils/test_json_util.py

+    assert json.loads(_json_dumps(TestDataClass("example", True))) == {"field1": "example", "field2": True}
+    assert json.loads(_json_dumps(TestDataClass("example", None))) == {"field1": "example", "field2": None}


Because errors occur due to the difference between Python’s bool/None literals (True / False / None ) and JSON boolean/null literals ( true / false / none ), I’m replacing literal_eval() with json.loads().

moskomule

Thanks. I commented a minor thing.

moskomule · 2026-04-03T07:07:25Z

flexeval/core/utils/json_util.py

        return type(o)(_truncate_base64(item) for item in o)
    if isinstance(o, dict):
        return {k: _truncate_base64(v) for k, v in o.items()}
+    if isinstance(o, int | float | bool | type(None)):


L16 uses pipe int | float... while L 12 uses tuple (list, tuple, ...).
Could you make it consistent?

thanks!

L16 uses pipe int | float... while L 12 uses tuple (list, tuple, ...).
Could you make it consistent?

I fixed it.
f12bf17

junya-takayama added 2 commits April 2, 2026 18:42

fix _truncate_base64

de4b1f3

format

7b3210e

junya-takayama marked this pull request as ready for review April 3, 2026 04:52

junya-takayama requested a review from moskomule April 3, 2026 04:58

junya-takayama commented Apr 3, 2026

View reviewed changes

moskomule approved these changes Apr 3, 2026

View reviewed changes

fix type annotation

f12bf17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix non-string types like `bool` / `int` being saved as `string` in outputs.jsonl#284

Fix non-string types like `bool` / `int` being saved as `string` in outputs.jsonl#284
junya-takayama wants to merge 3 commits intomainfrom
fix_truncate_base64

junya-takayama commented Apr 2, 2026

Uh oh!

junya-takayama commented Apr 3, 2026 •

edited

Loading

Uh oh!

junya-takayama Apr 3, 2026 •

edited

Loading

Uh oh!

moskomule left a comment

Uh oh!

moskomule Apr 3, 2026

Uh oh!

junya-takayama Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		assert json.loads(_json_dumps(TestDataClass("example", True))) == {"field1": "example", "field2": True}
		assert json.loads(_json_dumps(TestDataClass("example", None))) == {"field1": "example", "field2": None}

Conversation

junya-takayama commented Apr 2, 2026

Uh oh!

junya-takayama commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

junya-takayama Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

moskomule left a comment

Choose a reason for hiding this comment

Uh oh!

moskomule Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

junya-takayama Apr 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

junya-takayama commented Apr 3, 2026 •

edited

Loading

junya-takayama Apr 3, 2026 •

edited

Loading