-
Notifications
You must be signed in to change notification settings - Fork 3.6k
[opt](agent-task) Add a daemon thread to clean up agent tasks on dead BEs #57591
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[opt](agent-task) Add a daemon thread to clean up agent tasks on dead BEs #57591
Conversation
|
Thank you for your contribution to Apache Doris. Please clearly describe your PR:
|
f49b444 to
0fc79e3
Compare
f95032d to
84a8250
Compare
|
run buildall |
84a8250 to
9a11c54
Compare
|
run buildall |
9a11c54 to
905e5dc
Compare
|
run buildall |
TPC-DS: Total hot run time: 189681 ms |
ClickBench: Total hot run time: 27.43 s |
905e5dc to
191e48a
Compare
|
run buildall |
TPC-DS: Total hot run time: 188815 ms |
ClickBench: Total hot run time: 27.41 s |
FE Regression Coverage ReportIncrement line coverage |
|
run buildall |
3e248c3 to
65794f3
Compare
|
run buildall |
TPC-H: Total hot run time: 34515 ms |
TPC-DS: Total hot run time: 188569 ms |
ClickBench: Total hot run time: 27.38 s |
FE Regression Coverage ReportIncrement line coverage |
|
run p0 |
|
run cloud_p0 |
|
run nonConcurrent |
FE Regression Coverage ReportIncrement line coverage |
2 similar comments
FE Regression Coverage ReportIncrement line coverage |
FE Regression Coverage ReportIncrement line coverage |
| public class AgentTaskCleanupDaemon extends MasterDaemon { | ||
| private static final Logger LOG = LogManager.getLogger(AgentTaskCleanupDaemon.class); | ||
|
|
||
| public static final Integer MAX_FAILURE_TIMES = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 is too small, heartbeat interval is 5 seconds sometimes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, config in seconds is more readable and understood for users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3 is too small, heartbeat interval is 5 seconds sometimes.
But Config.agent_task_health_check_intervals_ms is actually 5minutes by default, >=MAX_FAILURE_TIMES means BE is unavailable more than 10 minutes.
|
PR approved by at least one committer and no changes requested. |
dataroaring
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
… BEs (apache#57591) When BE down, corresponding tasks will never finish until timeout. Fix this problem by adding a daemon thrad to do clean up.
… BEs (apache#57591) When BE down, corresponding tasks will never finish until timeout. Fix this problem by adding a daemon thrad to do clean up.
… BEs (apache#57591) When BE down, corresponding tasks will never finish until timeout. Fix this problem by adding a daemon thrad to do clean up.
… BEs (apache#57591) When BE down, corresponding tasks will never finish until timeout. Fix this problem by adding a daemon thrad to do clean up.
What problem does this PR solve?
Problem Summary:
When BE down, corresponding tasks will never finish until timeout. Fix this problem by adding a daemon thrad to do clean up.
Release note
None
Check List (For Author)
Test
Behavior changed:
Does this need documentation?
Check List (For Reviewer who merge this PR)