Skip to content

Conversation

@JSCU-CNI
Copy link
Contributor

Attempts to fix #204

@codecov
Copy link

codecov bot commented Jan 19, 2026

Codecov Report

❌ Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 83.33%. Comparing base (4bd4572) to head (87d884f).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
flow/record/jsonpacker.py 91.66% 1 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #205   +/-   ##
=======================================
  Coverage   83.32%   83.33%           
=======================================
  Files          35       35           
  Lines        3707     3714    +7     
=======================================
+ Hits         3089     3095    +6     
- Misses        618      619    +1     
Flag Coverage Δ
unittests 83.33% <91.66%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@yunzheng
Copy link
Member

yunzheng commented Jan 20, 2026

I went a bit in the JSONEncoder rabbit hole, so, json.dumps() has a cls= and a default= argument. We currently only use default= for unknown types.

Because fieldtypes.boolean inherits from int (because we cannot inherit from bool), it's a basic known type and the default=x will not be called on this field, as it's only called on "unknown" types that JSON cannot serialize.

If we want to change the serialization of basic types, we would need our own JSONEncoder class where we override encode() and iterencode() methods, but that seems like a bigger refactor if we want to go that route.

I therefore opted in to just handle an extra conversion step before passing it to json.dumps() that enforces the conversion of fieldtypes.boolean to an actual bool.

Let me know what you think.

@yunzheng yunzheng changed the title Add instancecheck for boolean fieldtype Fix JsonRecordPacker for boolean fieldtype Jan 20, 2026
@yunzheng yunzheng requested a review from Copilot January 20, 2026 11:53
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an issue where boolean fieldtypes were not properly converted to JSON booleans when packing dictionaries (OrderedDicts) using JsonRecordPacker. The Elastic adapter uses this pattern by calling packer.pack(record._asdict()), which was causing boolean fields to be serialized as integers instead of JSON booleans.

Changes:

  • Refactored boolean conversion logic into a reusable convert_basic_types() method that recursively processes dicts and lists
  • Extended pack() method to accept dictionaries in addition to Record and RecordDescriptor objects
  • Added comprehensive tests to verify dictionary packing and ensure no side effects on the input data

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
flow/record/jsonpacker.py Added convert_basic_types() method for recursive boolean conversion, updated pack_obj() to use it, and extended pack() to accept dicts
tests/packer/test_json_packer.py Added tests for packing OrderedDicts and verifying no side effects on input data
tests/fieldtypes/test_boolean.py New file containing the boolean fieldtype test (moved from test_fieldtypes.py)
tests/fieldtypes/test_fieldtypes.py Removed test_boolean() as it was moved to its own file

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@JSCU-CNI
Copy link
Contributor Author

JSCU-CNI commented Jan 20, 2026

Thank you for taking the time to look into this.

I went a bit in the JSONEncoder rabbit hole, so, json.dumps() has a cls= and a default= argument. We currently only use default= for unknown types.
Because fieldtypes.boolean inherits from int (because we cannot inherit from bool), it's a basic known type and the default=x will not be called on this field, as it's only called on "unknown" types that JSON cannot serialize.

If we want to change the serialization of basic types, we would need our own JSONEncoder class where we override encode() and iterencode() methods, but that seems like a bigger refactor if we want to go that route.

Yes I reached the same conclusion there, perhaps a custom JSONEncoder in the future would be better suited, to prevent duplicate serialization logic in the json packer for (ordered)dicts and records.

I therefore opted in to just handle an extra conversion step before passing it to json.dumps() that enforces the conversion of fieldtypes.boolean to an actual bool.

This seems like a neat solution for now. I hope it does not impact performance too much.

@JSCU-CNI JSCU-CNI requested a review from yunzheng January 20, 2026 13:17
@Schamper
Copy link
Member

This seems like a neat solution for now. I hope it does not impact performance too much.

Add a benchmark unit test 😄.

Copy link
Member

@yunzheng yunzheng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for merge, quick perftest with ipython:

In [7]: from flow.record.jsonpacker import JsonRecordPacker
   ...: from flow.record import RecordReader
   ...: with RecordReader("examples/records.json") as reader:
   ...:    records = list(reader)
   ...: packer = JsonRecordPacker()
   ...: record = records[0]

# original
In [12]: %timeit packer.pack_og(record)
6.45 μs ± 11 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

# new
In [13]: %timeit packer.pack(record)
6.47 μs ± 28.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

The mean difference should be negligible.

Adding a benchmark test would be nice indeed, but i'll leave that up to the author for this PR or future PR. If performance is a thing we could also look into more efficient json modules besides stdlib, like orjson.

@yunzheng yunzheng merged commit c3f8cd8 into fox-it:main Jan 21, 2026
29 checks passed
@yunzheng
Copy link
Member

i'll create a new issue for improving JsonRecordPacker.

@yunzheng
Copy link
Member

yunzheng commented Jan 21, 2026

This seems like a neat solution for now. I hope it does not impact performance too much.

Add a benchmark unit test 😄.

Created a new issue for this here: #206

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Boolean packed incorrectly by JSON packer

3 participants