Extend max token stale period buffer with dynamic computation based on TTL#1293
Conversation
… short-lived tokens. Introduce a max stale period of 20 minutes (previously a flat of 5 minutes for all token types), that dinamically scales down based on the token's TTL. The stale period is now computed at token acquisition time as: min(TTL x 0.5, max_stale_period).
Modify the MockRefreshable to work with the dynamic stale period. You can now override the dynamic refresh, but otherwise it will default to it. Add test_dynamic_stale_period test which verifies the whole fresh -> stale -> async refresh -> expired -> blobking refresh lifecycle for each of the three types of expected token lifespans (90 mins, 10 mins, 90 sec).
Signed-off-by: mihaimitrea-db <mihai.mitrea@databricks.com>
When the TTL is < 0 (the token is expired) the stale period is set to 0 to avoid having a negative stale period.
…fer_extension' into auth_token_buffer_extension
…ment Reintroduce the stale_period attribute in the Refreshable constructor for backwards compatibility as an optional field. If it is present default back to the legacy fixed stale period of 5 minutes. If it is not present the dynamic stale period based on the token's TTL is used. The _use_dynamic_stale_duration flag is used to differentiate between these two methods.
| # Config properties | ||
| self._stale_duration = stale_duration | ||
| self._use_dynamic_stale_duration = stale_duration is None | ||
| self._stale_duration = stale_duration if stale_duration is not None else timedelta(seconds=0) |
There was a problem hiding this comment.
[optional] Wondering if it won't be simpler to make this a timestamp rather than a duration.
| """ | ||
| For each token type, verifies: | ||
| - Stale tokens trigger async refresh (returns old token immediately) | ||
| - Expired tokens trigger blocking refresh (blocks and returns new token) | ||
| """ |
There was a problem hiding this comment.
Can we separate this into two tests? Also, is there a way we could avoid waiting for 1 second in the loop?
There was a problem hiding this comment.
Good point, 1 second is too much. I think without changing the implementation of blocking_token() I can drop this to 0.1.
Does that work?
There was a problem hiding this comment.
Discussed offline, let's proceed with the code as it is and maybe follow-up with using threading to wait exactly the right amount of time. The goal here is to make sure that we are most wasting seconds on a test suite that should run in 2 digits milliseconds.
Address PR feedback by: - Splitting test_dynamic_stale_period into two tests: - test_dynamic_stale_period_blocking_refresh: expired token triggers blocking refresh - test_dynamic_stale_period_async_refresh: stale token triggers async refresh - Reducing the post-unblock wait from 1s to 0.1s Full removal of the wait (e.g. via threading or explicit event) to be done in a follow-up.
|
If integration tests don't run automatically, an authorized user can run them manually by following the instructions below: Trigger: Inputs:
Checks will be approved automatically on success. |
Summary
Implement dynamic token stale period for improved auth reliability
Extends the token refresh buffer from a fixed 5 minutes to a dynamic period that adapts to token lifetime, improving reliability across different token types while maintaining backward compatibility.
The TTL is computed based on the time to live at the moment the token is received and not it's real TTL. For example, if the token has a TTL of 60 mins, but has been acquired 20 mins before being used by the SDK (e.g. though the CLI), it's TTL will be 60 - 20 = 40 mins.
Changes:
Backward compatibility:
Implementation:
Token behavior examples:
Testing: