-
Notifications
You must be signed in to change notification settings - Fork 515
ESET_Protect: clear cursor on 200 OK using empty cursor object
#15831
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
💚 CLA has been signed |
|
CLA has been signed |
|
Pinging @elastic/security-service-integrations (Team:Security-Service Integrations) |
|
changed from |
| "page_size": state.page_size | ||
| "page_size": state.page_size, | ||
| "cursor": { | ||
| "response_id": null |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not necessary; the cursor object will replace the existing object, blatting out the response_id field.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review. A quick clarification on why this change is necessary.
The “blatting out” behaviour only takes effect when the CEL program returns a cursor object in that evaluation. In the current device_task code path for 200 OK the program does not return any cursor field at all, so the previous cursor is retained:
resp.StatusCode == 200
?
{
"events": ...,
"page_token": ...,
"want_more": ...,
"page_size": state.page_size
}
:
...
The 202 Accepted does set a cursor with a Response-Id:
state.with({
"cursor": { "response_id": resp.Header["Response-Id"][0] }
})
Because the 200 OK branch omitted cursor, the existing cursor object (including the cached response_id) persisted across polls. Subsequent requests kept sending that stale header and the backend returned 404 Not Found.
This PR fixes that by making the 200 OK path return:
"cursor": { "response_id": null }
which explicitly clears the stale value. This mirrors the detection data stream (see integrations/packages/eset_protect/data_stream/detection/agent/stream/cel.yml.hbs), where the 200 OK already resets the response_id. After applying this change the 404s disappeared and the input has remained healthy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand the reason that it's there, but the expression is more complex than is required to achieve that goal.
You have this:
-- src.cel --
state.with(
{
"cursor": {"response_id": null},
}
)
-- data.json --
{
"cursor": {
"response_id": 42
}
}
-- out.json --
{
"cursor": {
"response_id": null
}
}
with the goal that there be no usable value in state.cursor.response_id, but this also works:
-- src.cel --
state.with(
{
"cursor": {},
}
)
-- data.json --
{
"cursor": {
"response_id": 42
}
}
-- out.json --
{
"cursor": {}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. Agreed "cursor": {} is sufficient here and simpler.
In device_task the 200 OK branch doesn’t need to preserve any other cursor fields, so I will update the code from:
"page_size": state.page_size,
"cursor": { "response_id": null }to:
"page_size": state.page_size,
"cursor": {}As you suggest, this should fully reset the cursor so no stale response-id gets sent on subsequent polls.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change applied in: 52c7d53
200 OK using empty cursor object
|
/test |
🚀 Benchmarks reportTo see the full report comment with |
💚 Build Succeeded
|
|
Package eset_protect - 1.11.1 containing this change is available at https://epr.elastic.co/package/eset_protect/1.11.1/ |
…evice_task stream (elastic#15831) The ESET Automation API sometimes replies with 202 Accepted and a Response-Id header, which must be sent with the next poll request. Once the task completes and the API returns 200 OK, that Response-Id becomes invalid. In version 1.10.0, the integration introduced caching of this Response-Id but did not clear it after a successful 200 OK response. As a result, subsequent requests continued to send the stale header, leading to recurring 404 Not Found errors and degraded input health. This patch clears the cached Response-Id after each 200 OK, ensuring the header is only included when valid.
Proposed commit message
WHAT
This PR updates the
CELprogram for thedevice_taskdata stream to correctly handle theResponse-Idheader returned by the ESET API. When the API responds with202 Accepted, it indicates that the result is still being prepared which includes aResponse-Idthat must be sent with the next request. Once the task completes and the API responds with200 OK, that ID becomes invalid (as it is no longer needed). Previously, the integration continued sending this outdated ID with subsequent requests, even after a successful response, which caused the API to return404 Not Founderrors and led to degraded input health. With this patch, the cachedResponse-Idis cleared after every200 OK, which ensures that only valid headers are used.WHY
In production we got this error:

Version
1.10.0ofeset_protectintroduced an unintended bug where the integration did not properly clear its cached state when transitioning from a202 Acceptedresponse to a200 OKresponse. The error lies in this snippet added by the author of this commit:This change added a new
cursorfield containing theResponse-Idheader, which became cached between requests.However, the cached
Response-Idwas never cleared after a successful200 OKresponse. As a result, subsequent polls kept sending a staleResponse-Idheader, which the ESET API couldn't process and as a result returned a404 Not Found.TL;DR
GET /v1/device_tasksin ESET Automation sometimes replies with202 Acceptedand aResponse-Idheader, clients must send that header on the next poll to retrieve the cached result. Once the service returns200 OK, the header must no longer be sent.In our production environment adding these changes and manually adding the package removed the recurring
404 Not Founderrors and the input has remained healthy (for extended time).Checklist
changelog.ymlfile.Author's Checklist
How to test this PR locally
Related issues
Screenshots