http: fix potential FD leak when zombie streams prevent connection close #42859

wdauchy · 2026-01-05T16:43:30Z

Commit Message:
I am chasing a case where a client sends POST messages on a HTTP1.1 connection, but is being systematically denied. The connection is immediately reused by the client. In some case we seem to trigger a racy case where the number of open file descriptor explodes.

When a stream becomes a zombie (waiting for codec encode completion) and the connection is in DrainState::Closing, the connection is not closed because checkForDeferredClose() is not called after the zombie stream was finally destroyed.

This commonly occurs with HTTP/1.1 connections when:

ext_authz (or similar filter) denies a request early, sending a 403 response before the full request body is received
The connection is (potentially) marked DrainState::Closing
The stream becomes a zombie waiting for the codec to finish encoding
When onCodecEncodeComplete() fires, doDeferredStreamDestroy() was called but checkForDeferredClose() was not, leaving the connection open

In scenarios where clients try to reuse the same HTTP/1.1 connection and requests are systematically denied (e.g., by ext_authz), this causes rapid FD exhaustion as each denied request leaves an orphaned connection.

The fix adds checkForDeferredClose() calls after zombie stream destruction in both onCodecEncodeComplete() and onCodecLowLevelReset().
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]

repokitteh-read-only · 2026-01-05T16:43:36Z

As a reminder, PRs marked as draft will not be automatically assigned reviewers,
or be handled by maintainer-oncall triage.

Please mark your PR as ready when you want it to be reviewed!

🐱

Caused by: #42859 was opened by wdauchy.

see: more, trace.

I am chasing a case where a client sends POST messages on a HTTP1.1 connection, but is being systematically denied. The connection is immediately reused by the client. In some case we seem to trigger a racy case where the number of open file descriptor explodes. When a stream becomes a zombie (waiting for codec encode completion) and the connection is in DrainState::Closing, the connection is not closed because checkForDeferredClose() is not called after the zombie stream was finally destroyed. This commonly occurs with HTTP/1.1 connections when: - ext_authz (or similar filter) denies a request early, sending a 403 response before the full request body is received - The connection is (potentially) marked DrainState::Closing - The stream becomes a zombie waiting for the codec to finish encoding - When onCodecEncodeComplete() fires, doDeferredStreamDestroy() was called but checkForDeferredClose() was not, leaving the connection open In scenarios where clients try to reuse the same HTTP/1.1 connection and requests are systematically denied (e.g., by ext_authz), this causes rapid FD exhaustion as each denied request leaves an orphaned connection. The fix adds checkForDeferredClose() calls after zombie stream destruction in both onCodecEncodeComplete() and onCodecLowLevelReset(). Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>

botengyao · 2026-01-06T19:37:48Z

@KBaichoo, Kevin, do you want also to take a look for this?

botengyao

Thanks for fixing this, and left a high level question.

/wait

source/common/http/conn_manager_impl.cc

Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>

wdauchy · 2026-01-08T12:39:19Z

/retest transients

botengyao

/wait

botengyao · 2026-01-08T21:04:34Z

source/common/http/conn_manager_impl.cc

  if (state_.is_zombie_stream_) {
+    const bool skip_delay = shouldSkipDeferredCloseDelay();
    connection_manager_.doDeferredStreamDestroy(*this);
+    // After destroying a zombie stream, check if the connection should be


Do you mind adding a runtime guard and a release note? I think this will be a high risk change since the release will be next week, does merging it after early next week work for you?

KBaichoo

Nice work, looks good to me

/wait

KBaichoo · 2026-01-09T00:54:50Z

source/common/http/conn_manager_impl.cc

  if (state_.is_zombie_stream_) {
+    const bool skip_delay = shouldSkipDeferredCloseDelay();
    connection_manager_.doDeferredStreamDestroy(*this);
+    // After destroying a zombie stream, check if the connection should be


Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>

repokitteh-read-only · 2026-01-09T08:28:33Z

CC @envoyproxy/runtime-guard-changes: FYI only for changes made to (source/common/runtime/runtime_features.cc).

🐱

Caused by: #42859 was synchronize by wdauchy.

see: more, trace.

wdauchy · 2026-01-09T09:42:30Z

/retest transients

wdauchy force-pushed the fd_leak branch from 38a30e7 to cc7cb81 Compare January 5, 2026 17:30

wdauchy force-pushed the fd_leak branch from cc7cb81 to f476640 Compare January 5, 2026 17:52

botengyao assigned KBaichoo Jan 7, 2026

botengyao reviewed Jan 7, 2026

View reviewed changes

source/common/http/conn_manager_impl.cc Outdated Show resolved Hide resolved

repokitteh-read-only bot added the waiting label Jan 7, 2026

botengyao self-assigned this Jan 7, 2026

wdauchy marked this pull request as ready for review January 7, 2026 18:16

repokitteh-read-only bot removed the waiting label Jan 7, 2026

wdauchy force-pushed the fd_leak branch 6 times, most recently from 495f026 to c96a9be Compare January 8, 2026 08:42

wdauchy requested a review from botengyao January 8, 2026 10:34

keep same logic for delayed close

296a00a

Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>

wdauchy force-pushed the fd_leak branch from c96a9be to 296a00a Compare January 8, 2026 11:05

botengyao reviewed Jan 8, 2026

View reviewed changes

repokitteh-read-only bot added the waiting label Jan 8, 2026

KBaichoo previously approved these changes Jan 9, 2026

View reviewed changes

add runtime guard

dcd846f

Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>

wdauchy dismissed KBaichoo’s stale review via dcd846f January 9, 2026 08:28

repokitteh-read-only bot removed the waiting label Jan 9, 2026

http: fix potential FD leak when zombie streams prevent connection close #42859

Are you sure you want to change the base?

http: fix potential FD leak when zombie streams prevent connection close #42859

Conversation

wdauchy commented Jan 5, 2026

Uh oh!

repokitteh-read-only bot commented Jan 5, 2026

Uh oh!

botengyao commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

botengyao left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wdauchy commented Jan 8, 2026

Uh oh!

botengyao left a comment

Choose a reason for hiding this comment

Uh oh!

botengyao Jan 8, 2026

Choose a reason for hiding this comment

Uh oh!

KBaichoo Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

wdauchy Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

KBaichoo left a comment

Choose a reason for hiding this comment

Uh oh!

KBaichoo Jan 9, 2026

Choose a reason for hiding this comment

Uh oh!

repokitteh-read-only bot commented Jan 9, 2026

Uh oh!

wdauchy commented Jan 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

botengyao commented Jan 6, 2026 •

edited

Loading