feat(rust): add configurable size guardrails#3421
feat(rust): add configurable size guardrails#3421Zakir032002 wants to merge 19 commits intoapache:mainfrom
Conversation
…ollection_size, max_map_size) Adds three Fory builder methods that let callers cap the byte length of deserialized strings, element count of collections, and entry count of maps. When a limit is exceeded an informative Error is returned instead of a blind allocation, preventing OOM from crafted payloads. - config.rs: three Option<usize> fields - context.rs: check_string_bytes / check_collection_size / check_map_size helpers - fory.rs: builder methods for all three limits - buffer.rs: read_varuint36small() helper used by string check - serializer/string.rs: check before string allocation - serializer/collection.rs: check in generic Vec / collection read - serializer/primitive_list.rs: check for Vec<primitive> fast path - serializer/map.rs: check before HashMap / BTreeMap allocation - tests/tests/test_size_guardrails.rs: 6 integration tests - tests/tests/mod.rs: register new test module Fixes apache#3409
- Bug 1: move check_collection_size before polymorphic dispatch in read_collection_data so HashSet/LinkedList/BTreeSet with polymorphic elements no longer bypass the limit - Bug 2: adjust string byte budget for UTF-16 (len is code-units; multiply by 2 before checking) so an adversary cannot double the allocation limit by forcing UTF-16 encoding - Bug 3: remove duplicate check_bound in read_latin1_string that regressed every latin1/ASCII string on the hot deserialization path - Bug 4: move check_map_size before BTreeMap::new() for a consistent 'check before allocate' pattern matching the HashMap path
|
@chaokunyang , let me know if you need any changes |
|
Please set default value to max value of int32. And also check length is less or equal than buffer remaining size |
…g error message in read_collection_data - read_vec_data was missing the buffer-remaining check entirely; a crafted len value would pass check_collection_size (default i32::MAX) and immediately hit Vec::with_capacity causing OOM before any read. Added a precise per-element byte check using size_of::<T>() since the static type is known at this call site. - read_collection_data had a coarse lower-bound check (elements vs bytes) but the error message falsely implied a byte-level guarantee. Updated the message to accurately describe what the check ensures.
…a before Vec::with_capacity Previously a crafted size_bytes value (e.g. 400_000_000) would pass check_collection_size (default i32::MAX) and then allocate a massive Vec before read_bytes had a chance to catch the truncation. The new check compares size_bytes against the bytes actually remaining in the buffer, which is precise because primitive_list encodes the payload length in bytes — so no approximation is needed.
…_bytes check_collection_size and check_map_size are both marked #[inline(always)] but check_string_bytes was not, despite being called on every single string deserialization — the hottest deserialization path in virtually every real-world payload. Consistent with the other two guard methods.
…nd primitive_list - test_vec_buffer_truncation_rejected: exercises the new buffer-remaining guard in read_vec_data (collection.rs). Crafts a valid Vec<i32> payload then truncates it so the declared element count cannot be satisfied. - test_primitive_list_buffer_truncation_rejected: exercises the new guard in primitive_list.rs fory_read_data. Serializes a Vec<i64>, truncates the payload, and confirms rejection even with default (i32::MAX) limits. - Existing test_buffer_truncation_rejected covers the string path and remains unchanged.
f2d745c to
4057e18
Compare
| @@ -315,6 +315,9 @@ pub struct ReadContext<'a> { | |||
| xlang: bool, | |||
| max_dyn_depth: u32, | |||
| check_struct_version: bool, | |||
| max_string_bytes: usize, | |||
There was a problem hiding this comment.
Let's simplify the config options, reserve:
- max_binary_size, set default value to 64MB
- max_collection_size, set default value to 1_000_000
| @@ -472,6 +478,39 @@ impl<'a> ReadContext<'a> { | |||
| self.meta_string_resolver.read_meta_string(&mut self.reader) | |||
| } | |||
|
|
|||
| #[inline(always)] | |||
| pub fn check_string_bytes(&self, byte_len: usize) -> Result<(), Error> { | |||
There was a problem hiding this comment.
For string, we don't have to check bytes length, bevcause we don't have preallocaiton for string
- Add max_binary_size and max_collection_size guardrails - Enforce collection size limits in Vec and Map deserializers - Add buffer-remaining cross-checks to prevent pre-allocation attacks - Simplify configuration by consolidating string and map limits - Fixes apache#3409
|
Heyy @chaokunyang , Done with the changes as you requested |
| @@ -472,6 +476,28 @@ impl<'a> ReadContext<'a> { | |||
| self.meta_string_resolver.read_meta_string(&mut self.reader) | |||
| } | |||
|
|
|||
| #[inline(always)] | |||
| pub fn check_binary_size(&self, byte_len: usize) -> Result<(), Error> { | |||
There was a problem hiding this comment.
I don't think we need such methods? ANd it's even not used? why you add an unused method?
8845190 to
7891f39
Compare
|
@chaokunyang — You're right, that method was unnecessary and unused. The implementation now only enforces max_collection_size at actual preallocation sites (collections, maps, primitive lists), and includes the buffer-remaining check as requested. Config has been simplified to keep only max_binary_size (reserved, default 64MB) and max_collection_size (default 1_000_000). All checks are passing. Please let me know if anything else should be adjusted. |
| .bf | ||
| .len() | ||
| .saturating_sub(context.reader.cursor); | ||
| if len as usize > remaining { |
There was a problem hiding this comment.
This won't work for streaming deserialization
There was a problem hiding this comment.
Could you fix this? in stream deserialization, the buffer length doesn't represent remaining bytes anymore
Summary
Fixes #3409
Adds three opt-in
Forybuilder methods that let callers enforce upper bounds on the size of data allocated during deserialization. Without these guards a crafted payload can contain an absurdly large length prefix, causingVec::with_capacity/ string allocation to exhaust heap memory before a single byte of real data is read.All three limits default to
None(no limit), so this is 100 % backwards-compatible.Testing