[telemetry] Add retry logic for partial telemetry push failures#1224
[telemetry] Add retry logic for partial telemetry push failures#1224akgitrepos wants to merge 3 commits intodatabricks:mainfrom
Conversation
|
@gopalldb @vikrantpuppala Hi, I've implemented retry logic for partial telemetry push failures which was one of the TODO item. Would appreciate a review. Thanks! |
samikshya-db
left a comment
There was a problem hiding this comment.
Thanks for the great addition here, @akgitrepos ! 🥳
Added a few comments.
|
|
||
| @Override | ||
| public void pushEvent(TelemetryRequest request) throws Exception { | ||
| pushEventWithRetry(request, maxRetries); |
There was a problem hiding this comment.
nit : can you move the contents of pushEventWithRetry in this function itself?
There was a problem hiding this comment.
Refactored the recursive pushEventWithRetry into pushEvent with an iterative approach.
| private static final int DEFAULT_MAX_RETRIES = 3; | ||
| private static final long INITIAL_BACKOFF_MS = | ||
| 1000; // 1 second - matches DatabricksHttpRetryHandler | ||
| private static final long MAX_BACKOFF_MS = 10000; // 10 seconds - matches codebase standard |
There was a problem hiding this comment.
The best way would be to extract calculateExponentialBackoff and doSleepForDelay from DatabricksHttpRetryHandler as an util, and reuse it here.
The reason I was saying this is because the backoff logic and thread interruption logic is more mature there. (Also to honor DRY)
There was a problem hiding this comment.
Created RetryUtil shared utility class. Both DatabricksHttpRetryHandler and TelemetryPushClient now reuse the same backoff calculation and thread-sleep logic.
Implements retry logic for partial telemetry push failures. When the telemetry service returns a partial success response, the client now automatically retries up to 3 times with exponential backoff. This addresses the TODO comment in TelemetryPushClient.java. Signed-off-by: Akshey Sigdel <sigdelakshey@gmail.com> # Conflicts: # src/test/java/com/databricks/jdbc/telemetry/TelemetryPushClientTest.java
Signed-off-by: Akshey Sigdel <sigdelakshey@gmail.com>
cf71661 to
576535f
Compare
|
@samikshya-db Any additional feedback on this? |
Description
Implements retry logic for partial telemetry push failures in
TelemetryPushClient. When the telemetry service returns a partial success response (numProtoSuccess < totalEvents), the client now automatically retries up to 3 times with exponential backoff (1s → 2s → 4s, max 10s).This addresses the TODO comment in
TelemetryPushClient.java.Changes Made
maxRetriesparameter (default: 3)DatabricksHttpRetryHandler)Testing
pushEvent_noRetryOnFullSuccess- Verifies no retry when all events succeedpushEvent_retriesOnPartialSuccess- Verifies retry is triggered on partial failureAdditional Notes