High flush time during performance testing [Q&A] #5015

urupaud · 2025-07-02T05:23:54Z

urupaud
Jul 2, 2025

What is a problem?

Hi Team,

We’re running an OpenSearch cluster deployed on k8s using opensearch-k8s-operator with 3 master nodes and 5 hot nodes. Fluentd (2 pods) is handling log forwarding within the same cluster, sending logs to the hot nodes via a headless service. We’ve also tested pointing directly to the pod IPs. Under normal load, everything works great — flush times stay around 10–25ms. But during performance testing, we’re seeing flush times increase to 400ms or more.

Appreciate any pointers!

I've also attached screen shots of the metrics such as flush time, flush errors, retries etc.

Describe the configuration of Fluentd

#Output debug to opensearch
    <match tcp.events.debug>
      @type opensearch
      time_key time
      keep_time_key true
      logstash_format true
      logstash_prefix debug
      @log_level warn
      request_timeout 60s
      reconnect_on_error true
      reload_on_failure true
      reload_connections false
      http_backend typhoeus
      <buffer>
        @type memory
        chunk_limit_size 16MB
        total_limit_size 512MB
        queued_chunks_limit_size 32
        retry_type periodic
        flush_at_shutdown true
        flush_mode interval
        flush_interval 3s
        flush_thread_count 2
        overflow_action drop_oldest_chunk
        retry_wait 1s
      </buffer>
  
      hosts opensearch-cluster-hot-nodes-0.opensearch-service.opensearch.svc.cluster.local:9200,opensearch-cluster-hot-nodes-1.opensearch-service.opensearch.svc.cluster.local:9200,opensearch-cluster-hot-nodes-2.opensearch-service.opensearch.svc.cluster.local:9200,opensearch-cluster-hot-nodes-3.opensearch-service.opensearch.svc.cluster.local:9200,opensearch-cluster-hot-nodes-4.opensearch-service.opensearch.svc.cluster.local:9200
      user "#{ENV['FLUENT_OPENSEARCH_USER']}"
      password "#{ENV['FLUENT_OPENSEARCH_PASSWORD']}"
      scheme https
      ssl_verify false
    </match>

Describe the logs of Fluentd

No response

Environment

- Fluentd version: fluentd:v1.16.2-debian-1.1
- TD Agent version:
- Fluent Package version:
- Docker image (tag): fluentd:v1.16.2-debian-1.1
- Operating system:
- Kernel version:

daipom · 2025-07-02T09:35:06Z

daipom
Jul 2, 2025
Maintainer

Thanks for the performance data!

But during performance testing, we’re seeing flush times increase to 400ms or more.

It's normal for flushing to take longer if the size of the chunk being sent increases or if the server's response becomes slower.
Is there anything that seems unnatural or problematic?

0 replies

urupaud · 2025-07-02T11:03:10Z

urupaud
Jul 2, 2025
Author

Thanks for getting back 🙏

Never faced such flush delays when running Fluentd v1.11 with Elasticsearch 6.8 on EC2. However, with OpenSearch 2.17.1 and Fluentd v1.16.2 deployed on k8s, we're seeing flush latency increases during the performance testings. This behaviour is directly impacting application performance causing delays. We are working on implementing asynchronous logging, but wanted to check the possibilities of keeping the flush time as low as possible.

Do you see any improvements we could make?

0 replies

urupaud · 2025-07-02T11:05:09Z

urupaud
Jul 2, 2025
Author

Do you think https and authentication could cause any significant delays? We didn’t have these enabled in our previous setup.

0 replies

daipom · 2025-07-03T02:17:34Z

daipom
Jul 3, 2025
Maintainer

Never faced such flush delays when running Fluentd v1.11 with Elasticsearch 6.8 on EC2. However, with OpenSearch 2.17.1 and Fluentd v1.16.2 deployed on k8s, we're seeing flush latency increases during the performance testings. This behaviour is directly impacting application performance causing delays. We are working on implementing asynchronous logging, but wanted to check the possibilities of keeping the flush time as low as possible.

I see.
Either the fluent-plugin-opensearch plugin or OpenSearch is most likely the cause.

Do you think https and authentication could cause any significant delays? We didn’t have these enabled in our previous setup.

I'm not sure...

It could be due to differences in such config or version differences of the fluent-plugin-opensearch plugin or OpenSearch.

Could you narrow down the cause through comparative testing?
Without identifying the cause to some extent, it will be difficult to consider possible improvements.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

High flush time during performance testing [Q&A] #5015

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

High flush time during performance testing [Q&A] #5015

Uh oh!

Uh oh!

urupaud Jul 2, 2025

What is a problem?

Describe the configuration of Fluentd

Describe the logs of Fluentd

Environment

Replies: 4 comments

Uh oh!

daipom Jul 2, 2025 Maintainer

Uh oh!

urupaud Jul 2, 2025 Author

Uh oh!

urupaud Jul 2, 2025 Author

Uh oh!

daipom Jul 3, 2025 Maintainer

urupaud
Jul 2, 2025

daipom
Jul 2, 2025
Maintainer

urupaud
Jul 2, 2025
Author

urupaud
Jul 2, 2025
Author

daipom
Jul 3, 2025
Maintainer