Skip to content

feat(go): add go desrialization support via io streams#3374

Open
ayush00git wants to merge 30 commits intoapache:mainfrom
ayush00git:feat/go-deserialization
Open

feat(go): add go desrialization support via io streams#3374
ayush00git wants to merge 30 commits intoapache:mainfrom
ayush00git:feat/go-deserialization

Conversation

@ayush00git
Copy link
Contributor

@ayush00git ayush00git commented Feb 20, 2026

Why?

To enable stream-based deserialization in Fory's Go library, allowing for direct reading from io.Reader without pre-buffering the entire payload. This improves efficiency for network and file-based transport and brings the Go implementation into feature-parity with the python and C++ libraries.

What does this PR do?

1. Stream Infrastructure in go/fory/buffer.go

Enhanced ByteBuffer to support io.Reader with an internal sliding window and automatic filling.

  • Added reader io.Reader and minCap int fields.
  • Implemented fill(n int, err *Error) bool for on-demand data fetching and buffer compaction.
  • Added CheckReadable(n) and Skip(n) memory-safe routines that pull from the underlying stream when necessary to avoid out-of-bounds panics.
  • Updated ReadBinary and ReadBytes to safely copy slices when streaming to prevent silent data corruption on compaction.
  • Updated all Read* methods (fixed-size, varint, tagged) to fetch data from the reader safely if not cached.

2. Stateful StreamReader in go/fory/stream.go

Added the StreamReader feature to support true, stateful sequential stream reads.

  • Introduced StreamReader which persists the buffered byte window and TypeResolver metadata (Meta Sharing) across multiple object decodes on the same stream, decoupled from Fory to mirror the C++ ForyInputStream implementation.
  • Added fory.DeserializeFromStream(sr, target) method to process continuous streamed data.
  • Added DeserializeFromReader method as an API for simple one-off stream object reads.

3. Stream-Safe Deserialization Paths

Updated internal deserialization pipelines in struct.go and type_def.go to be stream-safe:

  • Integrated CheckReadable bounds-checking into the struct.go fast paths for fixed-size primitives.
  • Safely rewrote schema-evolution skips (skipTypeDef) in type_def.go to use bounds-checked Skip() rather than unbounded readerIndex overrides.

4. Comprehensive Stream Tests

  • Built a custom oneByteReader wrapper (go/fory/test_helper_test.go) that artificially feeds the deserialization engine exactly 1 byte at a time.
  • Migrated the global test suite (struct_test.go, primitive_test.go, slice_primitive_test.go, etc.) to run all standard tests through this aggressive 1-byte fragmented stream reader via a new testDeserialize helper to guarantee total stream robustness.

Related issues

Closes #3302

Does this PR introduce any user-facing change?

  • Does this PR introduce any public API change? (NewStreamReader, DeserializeFromStream, DeserializeFromReader, NewByteBufferFromReader)
  • Does this PR introduce any binary protocol compatibility change?

Benchmark

Added boundaries checks did not cause performance regressions in the structSerializer fast paths. Tested using benchstat against the main branch.

@ayush00git ayush00git changed the title feat(go): add go desrialization support via transport streams feat(go): add go desrialization support via io streams Feb 20, 2026
@ayush00git
Copy link
Contributor Author

Hey @chaokunyang
Have a review and let me know the changes

@Zakir032002
Copy link

hey @ayush00git, looked through this and the main issue i see is in DeserializeFromReader
it calls ResetWithReader at the start of every call:

func (f *Fory) DeserializeFromReader(r io.Reader, v any) error {
    defer f.resetReadState()
    f.readCtx.buffer.ResetWithReader(r, 0) // this wipes the prefetch window every time

so if fill() reads ahead past the first object boundary (which it will), those bytes
are gone on the next call. sequential decode from one stream is broken:

for {
    var msg Msg
    f.DeserializeFromReader(conn, &msg) // bytes after first object get thrown away
}

if you look at how he handles this for c++/python — the Buffer is constructed
from the stream once and passed to each deserialize call directly. the buffer holds
state across calls, it's never reset between objects. the python test
test_stream_deserialize_multiple_objects_from_single_stream shows this exactly —
same reader buffer passed to multiple fory.deserialize() calls.

the go version probably needs something similar — a stream reader type that owns the
buffer and gets reused across deserializations rather than resetting on each call.

Happy to discuss if I'm misreading the flow here

@ayush00git
Copy link
Contributor Author

Hiii @Zakir032002
Thanks for noticing this, exactly this is a bug in the implementation from my side. yes the call would clear any prefetched data from the ByteBuffer making the sequential reads from the stream impossible, also it was clearing the typemetadata as well. thanks for mentioning this, i'll look at the c++ python implementation to correct the deserializer.

@Zakir032002
Copy link

hey @ayush00git , one more thing — ReadBinary and ReadBytes return a direct slice into
b.data:

v := b.data[b.readerIndex : b.readerIndex+length]
return v

the problem is fill() compacts the buffer in-place:

copy(b.data, b.data[b.readerIndex:])

so if someone reads a []byte field and holds onto that slice, then the next
read triggers a fill() — the compaction just overwrote the bytes they're
still holding. no error, no panic, just wrong data.

in stream mode you probably want to copy before returning instead of aliasing:

if b.reader != nil {
    result := make([]byte, length)
    copy(result, b.data[b.readerIndex:b.readerIndex+length])
    b.readerIndex += length
    return result
}

in-memory path stays as is.

@Zakir032002
Copy link

also noticed — ReadVarUint32Small7 only does fill(1) for the first byte, but if that byte has 0x80 set it falls through to continueReadVarUint32 which isn't touched in this PR. so in stream mode, if a multi-byte varint straddles a chunk boundary, the continuation bytes may not be in the buffer yet — you either get a BufferOutOfBoundError or silently read the wrong bytes depending on what's sitting at that position in the buffer.

easiest fix is probably just routing the multi-byte case through readVarUint32Slow since that's already stream-aware after your changes. or adding fill(1) guards inside continueReadVarUint32 directly, either works.

Happy to discuss if I'm misreading the flow here

@ayush00git
Copy link
Contributor Author

Hey @Zakir032002
Sorry i'm a bit busy with my exams, as i get free, i'll review the comments

@ayush00git
Copy link
Contributor Author

Hii @Zakir032002
Thanks for pointing out the flows,

  • The ReadFromDeserializer and returning a direct slice into the data stream are wrongly implemented by me, thanks for suggesting the chnages to fix them as well.

But i think you misunderstood the ReadVarUint32Small7. We already have a check condition -

if len(b.data)-readIdx >= 5 {

}

If we are near a chunk boundary (less than 5 bytes remaining in the buffer), the execution completely skips continueReadVarUint32 and jumps straight to readVaruint36Slow. I don't think this part need any changes

@ayush00git
Copy link
Contributor Author

I've added the StreamReader which now creates a copy slice during desrialization to preserve the data between sequential desrialization calls. DeserializeFromReader only is there if the user wants to deserialize a single struct and don't want a stream overhead for that.

@ayush00git
Copy link
Contributor Author

Hii @chaokunyang
Have a look and let me know the changes.

@chaokunyang
Copy link
Collaborator

chaokunyang commented Mar 3, 2026

Please take #3307 as reference to finish the remaining works. And create a Deseralize help methods in tests, then use that instead of fory.Deserialize for deserialization, and in the Deseralize test helper, first deserialize from bytes, then wrap it into a OneByteStream to deserialize it to ensure deserialization works.

Then run benchmarks/go to compare with asf/main to enure your code change don't introduce any performance regression.

@ayush00git
Copy link
Contributor Author

Hey @chaokunyang
I discovered a bug in the struct implementation. In struct.go, there is a highly optimized "fast-path" that uses unsafe pointers to read structs from memory instantly, and while reading byte 1 by 1 it is failing tests, because it assumes entire memory buffer was already available.

@ayush00git
Copy link
Contributor Author

Hii @chaokunyang
The PR is now aligned with the #3307

go/fory/fory.go Outdated
// It maintains the ByteBuffer and ReadContext state across multiple Deserialize calls,
// preventing data loss from prefetched buffers and preserving TypeResolver metadata
// (Meta Sharing) across object boundaries.
type StreamReader struct {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create a stream.go and put it there

go/fory/fory.go Outdated

// NewStreamReader creates a new StreamReader that reads from the provided io.Reader.
// The StreamReader owns the buffer and maintains state across sequential Deserialize calls.
func (f *Fory) NewStreamReader(r io.Reader) *StreamReader {
Copy link
Collaborator

@chaokunyang chaokunyang Mar 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks really strange that the StreamReader is coupled with Fory.

Please read fory c++ implementation detaily and refine this PR, especially for API surface

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yaa in c++ the streamreader acted as a standalone type which had nothing to do with deserialization it just know to fetch bytes when needed, let me implement the same for go

@ayush00git
Copy link
Contributor Author

@chaokunyang
the api design now matches with c++, is there any other modification ?

@ayush00git ayush00git requested a review from chaokunyang March 3, 2026 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Go] Streaming Deserialization Support For Go

3 participants