Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 77 additions & 0 deletions error-reproduction/nesting/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# Disk buffer case

```bash
# This works always
cat depth32_nesting.json | vector --config disk/test_recursion_sender.yaml

cat depth33_nesting.json | vector --config disk/test_recursion_sender.yaml
# -->
# 2026-03-10T18:15:57.170671Z ERROR sink{component_kind="sink" component_id=blackhole
# component_type=blackhole}: vector_buffers::internal_events: Error encountered during
# buffer read. error=failed to decoded record: InvalidProtobufPayload
# error_code="decode_failed" error_type="reader_failed" stage="processing"


# on subsequent runs of either:
cat depth33_nesting.json | vector --config disk/test_recursion_sender.yaml
# or
cat depth32_nesting.json | vector --config disk/test_recursion_sender.yaml
# -->
# 2026-03-10T18:16:01.506210Z ERROR vector::topology::builder: Configuration error.
# error=Sink "blackhole": error occurred when building buffer: failed to build individual
# stage 0: failed to seek to position where writer left off: failed to validate the last
# written record: failed to decoded record: InvalidProtobufPayload
# internal_log_rate_limit=false

# reset using
rm -rf /tmp/vector
```


# Vector source-sink case

```bash

# In terminal 1:

vector --config source-sink/test_recursion_receiver.yaml

# In terminal 2:

# This works always
cat depth32_nesting.json| vector --config source-sink/test_recursion_sender.yaml

cat depth33_nesting.json | vector --config disk/test_recursion_sender.yaml
# -->
# 2026-03-10T18:24:30.268256Z WARN sink{component_kind="sink" component_id=sender
# component_type=vector}:request{request_id=1}: vector::sinks::util::retries: Retrying
# after error. error=Request failed: status: Internal, message: "failed to decode Protobuf
# message: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.
# fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.
# fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.
# fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.
# fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.
# fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.
# fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.
# fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.
# fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.
# fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.
# fields: Value.kind: ValueMap.fields: Value.kind: ValueMap.fields: Value.kind: Log.
# fields: EventWrapper.event: PushEventsRequest.events: recursion limit reached", details:
# [], metadata: MetadataMap { headers: {"content-type": "application/grpc", "date": "Tue,
# 10 Mar 2026 18:24:30 GMT", "content-length": "0"} }


# 2026-03-10T18:24:20.946874Z WARN sink{component_kind="sink" component_id=sender
# component_type=vector}:request{request_id=1}: vector::sinks::util::retries: Internal log
# [Retrying after error.] is being suppressed to avoid flooding.

# ^ above two warnings repeat until shutdown after 1 minute.

# 2026-03-10T18:25:20.143568Z ERROR vector::topology::running: components="sender" Failed
# to gracefully shut down in time. Killing components. internal_log_rate_limit=false

# This still works
cat depth32_nesting.json| vector --config source-sink/test_recursion_sender.yaml

```
2 changes: 2 additions & 0 deletions error-reproduction/nesting/depth32_nesting.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi": {"hi":true}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}

2 changes: 2 additions & 0 deletions error-reproduction/nesting/depth33_nesting.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi":{"hi": {"hi":{"hi":true}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}}

28 changes: 28 additions & 0 deletions error-reproduction/nesting/disk/test_recursion_sender.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
data_dir: /tmp/vector
api:
enabled: true
address: 0.0.0.0:8686
playground: false
sources:
stdin:
type: stdin
transforms:
parse_json:
type: remap
inputs:
- stdin
source: |
m, e = parse_json(.message)
if e == null {
.message = m
}
sinks:
blackhole:
type: blackhole
inputs:
- parse_json
print_interval_secs: 10
buffer:
type: disk
max_size: 268435488
when_full: block
44 changes: 44 additions & 0 deletions error-reproduction/nesting/max-depth/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# max_depth disk buffer reproduction test

Demonstrates that `parse_json(.message, max_depth: 3)` prevents the
`InvalidProtobufPayload` disk buffer corruption caused by deeply nested JSON.

## Background

The `InvalidProtobufPayload` error occurs when `parse_json` restores deeply
nested JSON (33+ levels) into `vrl::Value` objects that exceed prost's 100-level
recursion limit during disk buffer serialization. The fix (`fd6ad3af`) added
`max_depth: 3` to bound the parsing depth.

## Test configs

| Config | `parse_json` call | Expected result with depth 33 input |
|--------|-------------------|-------------------------------------|
| `test_max_depth_sender.yaml` | `parse_json(.message, max_depth: 3)` | Succeeds — nesting bounded |
| `test_no_depth_limit_sender.yaml` | `parse_json(.message)` | Fails — `InvalidProtobufPayload` |

Both configs use `stdin` source, `remap` transform, and `blackhole` sink with
disk buffering (268 MB, block when full).

## Running the tests

Uses `depth33_nesting.json` from the parent directory (`error-reproduction/nesting/`).

```bash
# Clean up between runs
rm -rf /tmp/vector-max-depth

# Test 1: max_depth: 3 with depth 33 — should SUCCEED
cat error-reproduction/nesting/depth33_nesting.json | \
vector --config error-reproduction/nesting/max-depth/test_max_depth_sender.yaml

# Clean up
rm -rf /tmp/vector-max-depth

# Test 2: no limit with depth 33 — should FAIL with InvalidProtobufPayload
cat error-reproduction/nesting/depth33_nesting.json | \
vector --config error-reproduction/nesting/max-depth/test_no_depth_limit_sender.yaml

# Clean up
rm -rf /tmp/vector-max-depth
```
28 changes: 28 additions & 0 deletions error-reproduction/nesting/max-depth/test_max_depth_sender.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
data_dir: /tmp/vector-max-depth
api:
enabled: true
address: 0.0.0.0:8686
playground: false
sources:
stdin:
type: stdin
transforms:
parse_json:
type: remap
inputs:
- stdin
source: |
m, e = parse_json(.message, max_depth: 3)
if e == null {
.message = m
}
sinks:
blackhole:
type: blackhole
inputs:
- parse_json
print_interval_secs: 10
buffer:
type: disk
max_size: 268435488
when_full: block
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
data_dir: /tmp/vector-max-depth
api:
enabled: true
address: 0.0.0.0:8686
playground: false
sources:
stdin:
type: stdin
transforms:
parse_json:
type: remap
inputs:
- stdin
source: |
m, e = parse_json(.message)
if e == null {
.message = m
}
sinks:
blackhole:
type: blackhole
inputs:
- parse_json
print_interval_secs: 10
buffer:
type: disk
max_size: 268435488
when_full: block
170 changes: 170 additions & 0 deletions error-reproduction/nesting/mermaid.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
---
title: "Disk Buffer: InvalidProtobufPayload Root Cause"
---
flowchart TD
%% ═══════════════════════════════════════════
%% CONVENTIONS
%% Rectangle [""] = object instance
%% Diamond {""} = decision / conditional
%% Stadium ([""]) = terminal state
%% Cylinder [("")] = persistent storage
%% Solid arrow --> = method call
%% Dotted arrow -.-> = data flow / annotation
%% ═══════════════════════════════════════════

subgraph WRITE["WRITE PATH"]
direction TB

W_RW["RecordWriter
writer.rs:372"]

W_EA["EventArray
ser.rs:81"]

W_PROTO["proto::EventArray
prost generated"]

W_BUF["encode_buf — bytes P
writer.rs:412"]

W_REC["Record
record.rs:51-73
─────────────
.checksum: CRC32 of id+meta+P
.id: u64
.metadata: u32
.payload: bytes P"]

W_DISK[("Data File
on disk")]

W_RW -->|".archive_record(id, record)
writer.rs:450"| W_EA

W_EA -->|".encode(&mut buf)
ser.rs:94-100"| W_PROTO

W_PROTO -->|"prost Message::encode()
✅ NO RECURSION LIMIT"| W_BUF

W_BUF -->|"Record::with_checksum()
record.rs:121-129"| W_REC

W_REC -->|"rkyv Serialize + write

Check failure on line 53 in error-reproduction/nesting/mermaid.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`rkyv` is not a recognized word. (unrecognized-spelling)
writer.rs:499-510"| W_DISK
end

W_DISK -.->|"same bytes on disk"| R_DISK

subgraph READ["READ PATH"]
direction TB

R_DISK[("Data File
on disk")]

R_RR["RecordReader
reader.rs:198
─────────────
.aligned_buf: AlignedVec
.checksummer: Hasher"]

Check failure on line 69 in error-reproduction/nesting/mermaid.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`checksummer` is not a recognized word. (unrecognized-spelling)

R_DISK -->|".fill_buf() + read N bytes
reader.rs:322-334"| R_RR

%% ── Gate 1: rkyv ──

Check warning on line 74 in error-reproduction/nesting/mermaid.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`rkyv` is not a recognized word. (unrecognized-spelling)

R_RR -->|"check_archived_root‹Record›(buf)
record.rs:193 → ser.rs:88"| D_RKYV{"rkyv archive

Check failure on line 77 in error-reproduction/nesting/mermaid.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`RKYV` is not a recognized word. (unrecognized-spelling)

Check warning on line 77 in error-reproduction/nesting/mermaid.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`rkyv` is not a recognized word. (unrecognized-spelling)
structurally valid?"}

D_RKYV -->|no| E_DESER(["FailedDeserialization

Check failure on line 80 in error-reproduction/nesting/mermaid.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`DESER` is not a recognized word. (unrecognized-spelling)

Check warning on line 80 in error-reproduction/nesting/mermaid.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`RKYV` is not a recognized word. (unrecognized-spelling)
record.rs:27
→ skip entire file"])

D_RKYV -->|yes| R_AR["ArchivedRecord

Check warning on line 84 in error-reproduction/nesting/mermaid.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`RKYV` is not a recognized word. (unrecognized-spelling)
record.rs:132"]

%% ── Gate 2: checksum ──

R_AR -->|".verify_checksum(hasher)
record.rs:144-154"| D_CRC{"CRC32(id + meta + payload)
== stored checksum?"}

D_CRC -->|no| E_CORRUPT(["Corrupted
record.rs:25
→ skip entire file"])

D_CRC -->|"yes"| R_TOKEN["ReadToken
reader.rs:30-33
─────────────
Under A1 + A2:
payload bytes B == bytes P"]

%% ── Gate 3: metadata ──

R_TOKEN -->|".read_record(token)
reader.rs:364-378"| R_AR2["ArchivedRecord
record.rs:132
─────────────
.payload() → bytes B
record.rs:139-141"]

R_AR2 -->|"decode_record_payload(record)
reader.rs:1135-1155"| D_META{"T::Metadata::from_u32()
and T::can_decode()?
reader.rs:1140-1144"}

D_META -->|no| E_INCOMPAT(["Incompatible

Check failure on line 117 in error-reproduction/nesting/mermaid.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`INCOMPAT` is not a recognized word. (unrecognized-spelling)
reader.rs:1145"])

D_META -->|yes| R_DEC["Encodable::decode impl
for EventArray
ser.rs:103-118"]

%% ── Attempt 1 ──

R_DEC -->|"proto::EventArray::decode(B.clone())
ser.rs:108"| R_PA["proto::EventArray
prost Message::decode
❌ RECURSION_LIMIT = 100"]

R_PA --> D_EA{"EventArray
decode succeeded?"}

D_EA -->|yes| S_OK1(["Ok — return EventArray"])

%% ── Attempt 2 ──

D_EA -->|"no — Err discarded
.or_else ser.rs:110"| R_PW["proto::EventWrapper
prost Message::decode
❌ RECURSION_LIMIT = 100"]

R_PW -->|"proto::EventWrapper::decode(B)
ser.rs:111"| D_EW{"EventWrapper
decode succeeded?"}

D_EW -->|yes| S_OK2(["Ok — return EventArray"])

D_EW -->|"no — Err discarded
.map_err ser.rs:113"| E_IPP(["DecodeError::InvalidProtobufPayload

Check failure on line 150 in error-reproduction/nesting/mermaid.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`IPP` is not a recognized word. (unrecognized-spelling)
ser.rs:113
──────────
SOLE PRODUCTION SITE"])
end

%% ═══ STYLING ═══
style WRITE fill:#f0f4ff,stroke:#4263eb,stroke-width:2px,color:#000
style READ fill:#fff5f5,stroke:#e03131,stroke-width:2px,color:#000

style W_PROTO fill:#d0ebff,stroke:#1971c2
style R_PA fill:#ffe3e3,stroke:#c92a2a
style R_PW fill:#ffe3e3,stroke:#c92a2a

style S_OK1 fill:#b2f2bb,stroke:#2f9e44,color:#000
style S_OK2 fill:#b2f2bb,stroke:#2f9e44,color:#000

style E_DESER fill:#ffe8cc,stroke:#e8590c,color:#000

Check warning on line 167 in error-reproduction/nesting/mermaid.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`DESER` is not a recognized word. (unrecognized-spelling)
style E_CORRUPT fill:#ffe8cc,stroke:#e8590c,color:#000
style E_INCOMPAT fill:#ffe8cc,stroke:#e8590c,color:#000

Check warning on line 169 in error-reproduction/nesting/mermaid.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`INCOMPAT` is not a recognized word. (unrecognized-spelling)
style E_IPP fill:#ff6b6b,stroke:#c92a2a,color:#fff,stroke-width:3px

Check warning on line 170 in error-reproduction/nesting/mermaid.md

View workflow job for this annotation

GitHub Actions / Check Spelling

`IPP` is not a recognized word. (unrecognized-spelling)
Loading
Loading