-
Notifications
You must be signed in to change notification settings - Fork 172
Expand Elasticsearch Search timeout #3531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
5c46e2f
6999255
ab8f318
20855fa
c39d785
a84ff59
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -172,23 +172,59 @@ The response includes an aggregation based on the `day_of_week` runtime field. U | |
|
|
||
| ## Search timeout [search-timeout] | ||
|
|
||
| By default, search requests donβt time out. The request waits for complete results from each shard before returning a response. | ||
| Search requests do not time out by default. The request waits for complete results from every shard before returning a response as outlined in the [basic read model](/deploy-manage/distributed-architecture/reading-and-writing-documents.md#_basic_read_model). | ||
|
|
||
| While [async search](async-search-api.md) is designed for long-running searches, you can also use the `timeout` parameter to specify a duration youβd like to wait on each shard to complete. Each shard collects hits within the specified time period. If collection isnβt finished when the period ends, {{es}} uses only the hits accumulated up to that point. The overall latency of a search request depends on the number of shards needed for the search and the number of concurrent shard requests. | ||
| You can override the search request to best-effort time out through its [`timeout` setting](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search), provided either as query parameter or within request body. If the search timeout is surpassed before the search request finishes for the given shard it stops where it is and returns partial results. The `timeout` setting | ||
|
|
||
| * Checks for duration expiration on a per shard basis. | ||
| * Performs cancellations along a shard's segment boundaries; therefore, large segments may delay cancellation. | ||
| * Compares timeout duration against the search `query` phase's duration. This implies that it does not include time spent in | ||
| * internet network nor inter-node transport network | ||
| * coordinating node wrapping task | ||
| * [thread pool queue](/troubleshoot/elasticsearch/task-queue-backlog.md#diagnose-task-queue-thread-pool) | ||
| * [`fetch` phase](elasticsearch://reference/elasticsearch/rest-apis/search-profile.md#profiling-fetch) | ||
|
|
||
| You can set a cluster-wide default `timeout` for all search requests. This is configured by the `search.default_search_timeout` cluster setting. This setting defaults to `-1` indicating disabled or no timeout. This cluster-wide time-out is used as fallback if no `timeout` argument is designated by a search request. You can override this to a desired [time unit](elasticsearch://reference/elasticsearch/rest-apis/api-conventions.md#time-units) value using the [update cluster settings API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-put-settings). In this case, the request will be cancelled using the [task cancellation API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-tasks-cancel). For example | ||
|
|
||
| ```console | ||
| GET /my-index-000001/_search | ||
| PUT /_cluster/settings | ||
| { "persistent": { "search.default_search_timeout": "5m" } } | ||
| ``` | ||
|
|
||
| :::{note} | ||
| The `search.default_search_timeout` setting's resolution sensitivity is based from expert setting `thread_pool.estimated_time_interval` which defaults to `200ms`. This means the minimum meaningful impact threshold for `search.default_search_timeout` would also be `200ms`. Elastic recommends against overriding this expert setting as it has far reaching impact. | ||
| ::: | ||
|
|
||
| The `search.default_search_timeout` cluster setting only applies to the current cluster and does not cascade during [Cross Cluster Search (CSS)](/solutions/search/cross-cluster-search.md). Remote clusters should be individually configured as makes sense for your use case. | ||
|
|
||
| ### Example [search-timeout-example] | ||
|
|
||
| To demonstrate the impact of the `timeout` parameter, consider an [async search](async-search-api.md). Async searches are designed for long-running searches, but you can use the `timeout` parameter to specify a duration youβd like to wait on each shard to complete. The overall latency of a search request depends on the number of shards needed for the search and the number of concurrent shard requests. Each shard collects hits within the specified time period. If collection isnβt finished when the period ends, {{es}} uses only the hits accumulated up to that point. | ||
|
|
||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not sure that async search is the best example for this. This is generally used in scenario where latency is crucial. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
It is the existing example (although it only had a request body no response body); π€·ββοΈ I have no horse in the race but also it doesn't really affect the response body example IMO.
I may need your help with a "more valid" example. This was simplified patterned from this real life which AFAICT looks par (that shards all "successful" but still There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am just proposing to change _async_search to _search, the example is fine. |
||
| For example response, if the `timeout` duration was surpassed for at least one shard, the API response will return a HTTP 200 status code but the response body field `timed_out` will report `true`. | ||
|
|
||
| ```json | ||
| { | ||
| "timeout": "2s", | ||
| "query": { | ||
| "match": { | ||
| "user.id": "kimchy" | ||
| } | ||
| "took" : 11, | ||
| "timed_out" : true, | ||
| "_shards" : { | ||
| "total" : 40, | ||
| "successful" : 40, | ||
| "skipped" : 0, | ||
| "failed" : 0 | ||
| }, | ||
| "hits" : { | ||
| "total" : { | ||
| "value" : 98393, // possibly incomplete value | ||
| "relation" : "eq" | ||
| }, | ||
|
|
||
| // ... | ||
| } | ||
| } | ||
| ``` | ||
|
|
||
| To set a cluster-wide default timeout for all search requests, configure `search.default_search_timeout` using the [cluster settings API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-cluster-put-settings). This global timeout duration is used if no `timeout` argument is passed in the request. If the global search timeout expires before the search request finishes, the request is cancelled using [task cancellation](https://www.elastic.co/docs/api/doc/elasticsearch/group/endpoint-tasks). The `search.default_search_timeout` setting defaults to `-1` (no timeout). | ||
| For a particular search request, if the request should error out instead of returning partial results, consider also overriding [`default_allow_partial_results` setting](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-search) to `false`. | ||
|
|
||
|
|
||
| ## Search cancellation [global-search-cancellation] | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still quite hairy ;)
What about something like:
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback! This is delineated because it's what Support spends time clarifying from the doc not currently saying (specifically the whole bullet list of what doesn't qualify, which is actually what started this whole PR flow) π.
FWIW maybe that's part of what @leemthompo's comment about language clean up π
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's too detailed and not up to date to my taste. The only thing that we should clarify is that it's applied per shard and we check it on a best effort basis every
thread_pool.estimated_time_interval.