-
Notifications
You must be signed in to change notification settings - Fork 5.2k
http: fix potential FD leak when zombie streams prevent connection close #42859
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
I am chasing a case where a client sends POST messages on a HTTP1.1 connection, but is being systematically denied. The connection is immediately reused by the client. In some case we seem to trigger a racy case where the number of open file descriptor explodes. When a stream becomes a zombie (waiting for codec encode completion) and the connection is in DrainState::Closing, the connection is not closed because checkForDeferredClose() is not called after the zombie stream was finally destroyed. This commonly occurs with HTTP/1.1 connections when: - ext_authz (or similar filter) denies a request early, sending a 403 response before the full request body is received - The connection is (potentially) marked DrainState::Closing - The stream becomes a zombie waiting for the codec to finish encoding - When onCodecEncodeComplete() fires, doDeferredStreamDestroy() was called but checkForDeferredClose() was not, leaving the connection open In scenarios where clients try to reuse the same HTTP/1.1 connection and requests are systematically denied (e.g., by ext_authz), this causes rapid FD exhaustion as each denied request leaves an orphaned connection. The fix adds checkForDeferredClose() calls after zombie stream destruction in both onCodecEncodeComplete() and onCodecLowLevelReset(). Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>
|
@KBaichoo, Kevin, do you want also to take a look for this? |
botengyao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this, and left a high level question.
/wait
495f026 to
c96a9be
Compare
Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>
|
/retest transients |
botengyao
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/wait
| if (state_.is_zombie_stream_) { | ||
| const bool skip_delay = shouldSkipDeferredCloseDelay(); | ||
| connection_manager_.doDeferredStreamDestroy(*this); | ||
| // After destroying a zombie stream, check if the connection should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind adding a runtime guard and a release note? I think this will be a high risk change since the release will be next week, does merging it after early next week work for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated
KBaichoo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work, looks good to me
/wait
| if (state_.is_zombie_stream_) { | ||
| const bool skip_delay = shouldSkipDeferredCloseDelay(); | ||
| connection_manager_.doDeferredStreamDestroy(*this); | ||
| // After destroying a zombie stream, check if the connection should be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
Signed-off-by: William Dauchy <william.dauchy@datadoghq.com>
|
/retest transients |
Commit Message:
I am chasing a case where a client sends POST messages on a HTTP1.1 connection, but is being systematically denied. The connection is immediately reused by the client. In some case we seem to trigger a racy case where the number of open file descriptor explodes.
When a stream becomes a zombie (waiting for codec encode completion) and the connection is in DrainState::Closing, the connection is not closed because checkForDeferredClose() is not called after the zombie stream was finally destroyed.
This commonly occurs with HTTP/1.1 connections when:
In scenarios where clients try to reuse the same HTTP/1.1 connection and requests are systematically denied (e.g., by ext_authz), this causes rapid FD exhaustion as each denied request leaves an orphaned connection.
The fix adds checkForDeferredClose() calls after zombie stream destruction in both onCodecEncodeComplete() and onCodecLowLevelReset().
Additional Description:
Risk Level:
Testing:
Docs Changes:
Release Notes:
Platform Specific Features:
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Fixes commit #PR or SHA]
[Optional Deprecated:]
[Optional API Considerations:]