-
Notifications
You must be signed in to change notification settings - Fork 13
Fix JsonRecordPacker for boolean fieldtype #205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #205 +/- ##
=======================================
Coverage 83.32% 83.33%
=======================================
Files 35 35
Lines 3707 3714 +7
=======================================
+ Hits 3089 3095 +6
- Misses 618 619 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
I went a bit in the JSONEncoder rabbit hole, so, Because If we want to change the serialization of basic types, we would need our own JSONEncoder class where we override I therefore opted in to just handle an extra conversion step before passing it to Let me know what you think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes an issue where boolean fieldtypes were not properly converted to JSON booleans when packing dictionaries (OrderedDicts) using JsonRecordPacker. The Elastic adapter uses this pattern by calling packer.pack(record._asdict()), which was causing boolean fields to be serialized as integers instead of JSON booleans.
Changes:
- Refactored boolean conversion logic into a reusable
convert_basic_types()method that recursively processes dicts and lists - Extended
pack()method to accept dictionaries in addition to Record and RecordDescriptor objects - Added comprehensive tests to verify dictionary packing and ensure no side effects on the input data
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| flow/record/jsonpacker.py | Added convert_basic_types() method for recursive boolean conversion, updated pack_obj() to use it, and extended pack() to accept dicts |
| tests/packer/test_json_packer.py | Added tests for packing OrderedDicts and verifying no side effects on input data |
| tests/fieldtypes/test_boolean.py | New file containing the boolean fieldtype test (moved from test_fieldtypes.py) |
| tests/fieldtypes/test_fieldtypes.py | Removed test_boolean() as it was moved to its own file |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
Thank you for taking the time to look into this.
Yes I reached the same conclusion there, perhaps a custom JSONEncoder in the future would be better suited, to prevent duplicate serialization logic in the json packer for (ordered)dicts and records.
This seems like a neat solution for now. I hope it does not impact performance too much. |
Add a benchmark unit test 😄. |
yunzheng
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good for merge, quick perftest with ipython:
In [7]: from flow.record.jsonpacker import JsonRecordPacker
...: from flow.record import RecordReader
...: with RecordReader("examples/records.json") as reader:
...: records = list(reader)
...: packer = JsonRecordPacker()
...: record = records[0]
# original
In [12]: %timeit packer.pack_og(record)
6.45 μs ± 11 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)
# new
In [13]: %timeit packer.pack(record)
6.47 μs ± 28.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)The mean difference should be negligible.
Adding a benchmark test would be nice indeed, but i'll leave that up to the author for this PR or future PR. If performance is a thing we could also look into more efficient json modules besides stdlib, like orjson.
|
i'll create a new issue for improving JsonRecordPacker. |
Created a new issue for this here: #206 |
Attempts to fix #204