Skip to content

Splunk Operator: Slow indexer pod to be ready when PVC has high file count #1648

@ductrung-nguyen

Description

@ductrung-nguyen

Please select the type of request

Enhancement

Tell us more

Describe the request

  • When a new Splunk indexer pod is created (or an existing one is re-created after restart/reschedule), startup can become extremely slow if the attached PVC already contains a very large number of files (hundreds of thousands).
  • From what we observe, OpenShift performs a recursive ownership/permission pass (chown/chmod) on the PVC during mount, and this step becomes a big bottleneck with high file counts.
  • In practice, the pod sits a long time in the “volume config / mount” phase before the container properly starts, and we often see timeout-like symptoms around that stage, then the pod become not Ready due to the ReadinessProbe.

Expected behavior

  • Indexer pod creation/restart should complete in a reasonable timeframe, even when the PVC is already populated with a lot of files.

Splunk setup on K8S

  • Splunk Operator managed deployment.
  • Indexer pods using PVCs for indexer data (high file count / large data volumes).
  • This affects normal ops actions too (rolling upgrade, node maintenance, probes, etc.), not only initial deployment.

Reproduction/Testing steps

  1. Use an indexer PVC already populated with a very high number of files (typical indexer data layout).
  2. Trigger indexer pod re-creation (restart/reschedule/rolling upgrade).
  3. Observe pod stuck or very slow during volume mount / container volume configuration.
  4. Check events/logs: we frequently see errors like CRI-O / kubelet “context deadline exceeded” and “name is reserved” during startup.
  5. In worst cases, time-to-ready can reach ~30 mins per pod.

K8s environment

  • OpenShift (OCP) cluster.
  • Issue happens during pod start, around PVC mount / volume preparation.
  • Frequency: indexer pods can be restarted multiple times per month, so the operational impact adds up quickly.

Proposed changes(optional)

  • None for now — filing as an issue because the current behavior causes major delays for indexer restart/recovery scenarios.

K8s collector data(optional)

Additional context(optional)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions