-
Notifications
You must be signed in to change notification settings - Fork 49
Description
What problem does your feature solve?
The current metrics emitted may not be enough data to diagnose performance degradations under client request load.
Related to: Client load testing Epic - stellar/stellar-rpc-blaster#2
What would you like to see?
Analyze to find gaps in performance metrics related to rpc request processing and add any subsequent metrics as detected.
Initial analysis revealed:
-
Client disconnect - Unable to detect if timeout error responses are due to client or server side. This could also help highlight flaky clients that driving up error rates.
- HTTP Error responses due to client disconnect should return 504 status.
- new metric key to focus on timeouts -
request_context_cancelled_total{endpoint="getEvents", reason="client_disconnect|deadline_exceeded"}
-
Endpoint request duration break out - identify how much request time is spent on db vs other i/o bounds
- json_rpc_request_duration_seconds{stage="db|xdr|core|other..."}
-
Endpoint request parameters - add new logging output that prints a stanard line format with endpoint name and serialized list of enpdoint request parameter values. This should enable log analysis such as from Loki to derive request profiling. Note - this isn't directly related to performance analysis.
What alternatives are there?
Metadata
Metadata
Assignees
Labels
Type
Projects
Status