Skip to content

Client Load Testing: enhance metrics output related to client load #574

@sreuland

Description

@sreuland

What problem does your feature solve?

The current metrics emitted may not be enough data to diagnose performance degradations under client request load.

Related to: Client load testing Epic - stellar/stellar-rpc-blaster#2

What would you like to see?

Analyze to find gaps in performance metrics related to rpc request processing and add any subsequent metrics as detected.

Initial analysis revealed:

  • Client disconnect - Unable to detect if timeout error responses are due to client or server side. This could also help highlight flaky clients that driving up error rates.

    • HTTP Error responses due to client disconnect should return 504 status.
    • new metric key to focus on timeouts - request_context_cancelled_total{endpoint="getEvents", reason="client_disconnect|deadline_exceeded"}
  • Endpoint request duration break out - identify how much request time is spent on db vs other i/o bounds

    • json_rpc_request_duration_seconds{stage="db|xdr|core|other..."}
  • Endpoint request parameters - add new logging output that prints a stanard line format with endpoint name and serialized list of enpdoint request parameter values. This should enable log analysis such as from Loki to derive request profiling. Note - this isn't directly related to performance analysis.

What alternatives are there?

Metadata

Metadata

Assignees

No one assigned

    Projects

    Status

    To Do

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions