Skip to content

in_tail: fix stream_offset not set when position restored from database#11674

Open
gkarpeev wants to merge 1 commit intofluent:masterfrom
gkarpeev:fix/offset-key-db-restore
Open

in_tail: fix stream_offset not set when position restored from database#11674
gkarpeev wants to merge 1 commit intofluent:masterfrom
gkarpeev:fix/offset-key-db-restore

Conversation

@gkarpeev
Copy link
Copy Markdown

@gkarpeev gkarpeev commented Apr 4, 2026

in_tail: fix offset_key reporting relative offset after DB position restore

Fixes #11670

Summary

When offset_key is configured and the file position is restored from the SQLite database (after restart), stream_offset is not set from the DB value. It stays at 0, causing offset_key to report relative offsets instead of absolute byte positions.

The fix sets stream_offset = file->offset in the DB branch of set_file_position(), guarded by the decompression_context == NULL check to match the existing pattern at the end of the function (compressed files track stream offset differently).

Testing

Configuration:

pipeline:
  inputs:
  - name: tail
    tag: test
    path: /mnt/input/*.log
    path_key: file_path
    offset_key: file_offset
    read_from_head: true
    db: /fluent-bit/data/test.db
    threaded: true
    mem_buf_limit: 30Mb
    buffer_chunk_size: 32Kb
    buffer_max_size: 2Mb

  outputs:
  - name: tcp
    match: test
    host: receiver
    port: 5170
    format: json_lines
    workers: 1

Steps:

  1. Start Fluent Bit, let it read a large file partially (DB offset > 0)
  2. Restart Fluent Bit
  3. Append new lines to the file
  4. Capture output with tcpdump

DB state confirming correct offset tracking:

sqlite> SELECT * FROM in_tail_files;
3|/mnt/input/test.log|9840308|33|1774994106|0

Before fix -- offsets start from 0 instead of 9,840,308 after restart:

{"file_path":"/mnt/input/test.log","file_offset":0,"log":"..."}
{"file_path":"/mnt/input/test.log","file_offset":143,"log":"..."}

After fix -- offsets match the DB position:

{"file_path":"/mnt/input/test.log","file_offset":9840308,"log":"..."}
{"file_path":"/mnt/input/test.log","file_offset":9840451,"log":"..."}
  • [N/A] Valgrind -- change is a single assignment, no allocation
  • [N/A] Packaging -- no packaging changes

Documentation

No documentation changes needed. offset_key already documents that it reports the file byte offset; this fix makes the behavior match the documentation.

Backporting

This bug has been present since stream_offset was introduced in v3.0.0 (before that, v1.7-v2.x used file->offset directly, which worked correctly). It affects all versions >= v3.0.0 with offset_key + db. Confirmed on v3.0.0, v4.2.0, v5.0.0, v5.0.1. Backport to stable branches is recommended.

Summary by CodeRabbit

Bug Fixes

  • Fixed a bug where stream offset was not being properly initialized when file positions were restored from persistent database storage. This could result in incorrect file processing during monitoring operations. The fix ensures proper initialization across all file position restoration code paths when using database persistence.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 4, 2026

📝 Walkthrough

Walkthrough

Fixed the set_file_position() function in the tail input plugin to initialize stream_offset when file position is restored from the database, ensuring consistent offset tracking across Fluent Bit restarts.

Changes

Cohort / File(s) Summary
Tail offset tracking fix
plugins/in_tail/tail_file.c
Added stream_offset initialization in the database file position restoration path to prevent offset values from resetting to 0 after restart.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Suggested reviewers

  • edsiper

Poem

🐰 A rabbit hops through database trails,
Where offsets now dance without fails,
Stream position flows from DB's store,
No more zeroes—consistency restored! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title directly describes the main fix: stream_offset initialization when database position restoration occurs, which is the core change in the code.
Linked Issues check ✅ Passed The code changes directly address issue #11670 by initializing stream_offset when position is restored from database in the DB code path, ensuring offset_key reports absolute offsets consistent with database values.
Out of Scope Changes check ✅ Passed The changes are narrowly scoped to fixing stream_offset initialization in the database restore path only, with no additional modifications beyond what is required by the linked issue.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
plugins/in_tail/tail_file.c (1)

1061-1064: Consider adding a regression test for restart + DB + offset_key.

The fix is correct; adding automated coverage for this path would help prevent regressions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/in_tail/tail_file.c` around lines 1061 - 1064, Add a regression test
that reproduces the restart + DB + offset_key codepath where
decompression_context is NULL so stream_offset is set from offset; specifically,
create a test that writes compressed and uncompressed input, configures the tail
plugin with db enabled and a specific offset_key, perform a restart (stop/start)
and assert that after restart the tracked file state (using file->offset and
file->stream_offset behavior) resumes correctly (no data duplication or loss).
Target the tail plugin integration tests that exercise offset persistence and
reference the symbols offset_key, decompression_context, stream_offset and
offset to ensure the case where decompression_context == NULL is covered and
guarded against regressions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@plugins/in_tail/tail_file.c`:
- Around line 1061-1064: Add a regression test that reproduces the restart + DB
+ offset_key codepath where decompression_context is NULL so stream_offset is
set from offset; specifically, create a test that writes compressed and
uncompressed input, configures the tail plugin with db enabled and a specific
offset_key, perform a restart (stop/start) and assert that after restart the
tracked file state (using file->offset and file->stream_offset behavior) resumes
correctly (no data duplication or loss). Target the tail plugin integration
tests that exercise offset persistence and reference the symbols offset_key,
decompression_context, stream_offset and offset to ensure the case where
decompression_context == NULL is covered and guarded against regressions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ab131576-7851-4bf4-ab8d-093e09ec14a9

📥 Commits

Reviewing files that changed from the base of the PR and between 3e414ac and ebf9719.

📒 Files selected for processing (1)
  • plugins/in_tail/tail_file.c

Fixes fluent#11670

Signed-off-by: Gleb Karpeev <gkarpeev@amdocs.ru>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

tail: offset_key inconsistent with stored offset after restart

1 participant