Skip to content

Conversation

@laik
Copy link

@laik laik commented Nov 10, 2025

Title: Introduce PVC Manager for High-Performance LocalPV Provisioning

Description:

This PR introduces a new architecture for the OpenEBS Dynamic LocalPV Provisioner, addressing performance bottlenecks identified during stress testing (see openebs/openebs#4050). The core change replaces the traditional helper pod mechanism with a DaemonSet-based PVC Manager service that performs volume operations via direct HTTP API calls.

Key Changes:

  1. New PVC Manager Component:

    • A new pvc-manager binary is added (cmd/pvc-manager/) which runs as a DaemonSet on each node.
    • It exposes HTTP endpoints for creating directories, applying quotas, and deleting volumes.
    • This eliminates the overhead of creating and terminating ephemeral helper pods for each operation.
  2. Provisioner Integration:

    • The main provisioner logic (cmd/provisioner-localpv/) is updated to conditionally use the new PVC Manager.
    • A new client (cmd/provisioner-localpv/app/pvc_manager_client.go) handles communication with the PVC Manager service.
    • New helper functions (cmd/provisioner-localpv/app/helper_pvc_manager.go) orchestrate the HTTP calls.
    • Environment variables OPENEBS_IO_ENABLE_PVC_MANAGER (default true) and OPENEBS_IO_PVC_MANAGER_PORT (default 8080) control this behavior.
  3. Deployment Updates:

    • New Kubernetes manifests are added for deploying the PVC Manager DaemonSet and its associated RBAC (deploy/kubectl/pvc-manager-*, deploy/helm/charts/templates/pvc-manager-*).
    • Documentation is provided in design/pvc-manager-architecture.md and deploy/helm/charts/PVC-MANAGER-INTEGRATION.md.

Benefits:

  • Significantly Improved Performance: As highlighted in issue #4050, this change drastically reduces PVC binding times, especially under load, by removing the pod creation bottleneck.
  • Reduced API Server Load: Fewer transient pod objects are created and managed by the Kubernetes API server.
  • Enhanced Reliability: A long-running service is generally more predictable than ephemeral pods.
  • Better Resource Utilization: The DaemonSet model provides a more consistent resource footprint.

This new architecture provides a substantial performance improvement for LocalPV provisioning, directly addressing the concerns raised in issue #4050. Backward compatibility with the helper pod mode is maintained via the OPENEBS_IO_ENABLE_PVC_MANAGER environment variable.

@laik laik requested a review from a team as a code owner November 10, 2025 15:19
@niladrih
Copy link
Member

Hi @laik. I'm yet to review your PR. Just letting you know that we require DCO signatures on all commits. You might have already noticed the failing CI job.

…and related fixes

Signed-off-by: laik <laik.lj@me.com>
@laik
Copy link
Author

laik commented Nov 12, 2025

Hi @laik. I'm yet to review your PR. Just letting you know that we require DCO signatures on all commits. You might have already noticed the failing CI job.

Yes, I saw it, it was added

Signed-off-by: laik <laik.lj@me.com>
@codecov-commenter
Copy link

codecov-commenter commented Nov 18, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 39.18%. Comparing base (cab53c4) to head (edfde5f).
⚠️ Report is 61 commits behind head on develop.
❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop     #290      +/-   ##
===========================================
+ Coverage    37.91%   39.18%   +1.26%     
===========================================
  Files           36        1      -35     
  Lines         3373      684    -2689     
===========================================
- Hits          1279      268    -1011     
+ Misses        2012      407    -1605     
+ Partials        82        9      -73     
Flag Coverage Δ
integrationtests 39.18% <ø> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Empty commit to trigger CI build

Signed-off-by: laik <laik.lj@me.com>
The pvc-manager Dockerfile expects the binary to be present in the build context, but the GitHub Actions workflow was not copying it there. This change ensures the binary is built and copied to the correct location before the Docker build step.

Signed-off-by: laik <laik.lj@me.com>
Copy link
Member

@tiagolobocastro tiagolobocastro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you consider using the distributed CSI node-plugin approach rather than a custom service?
Could you please add the xfs_quota fix as a separate commit?

@@ -0,0 +1,260 @@
#!/bin/bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#!/bin/bash
#!/use/bin/env bash

@tiagolobocastro
Copy link
Member

The image build seems to be failing, could you please take a look?

@laik
Copy link
Author

laik commented Nov 29, 2025

The image build seems to be failing, could you please take a look?

yeah , Let me look at this issue.

Did you consider using the distributed CSI node-plugin approach rather than a custom service? Could you please add the xfs_quota fix as a separate commit?

What model does distributed CSI refer to?

Signed-off-by: laik <laik.lj@me.com>
@tiagolobocastro
Copy link
Member

What model does distributed CSI refer to?

Node deployment (--node-deployment): https://github.com/kubernetes-csi/external-provisioner?tab=readme-ov-file#distributed-provisioning

This is the same model used by rawfile-localpv.

@laik
Copy link
Author

laik commented Dec 1, 2025

What model does distributed CSI refer to?

Node deployment (--node-deployment): https://github.com/kubernetes-csi/external-provisioner?tab=readme-ov-file#distributed-provisioning

This is the same model used by rawfile-localpv.

OK, I took a look, this model will be better and can reduce the complexity of network calls, waiting for my testing and next submission. thanks a lot for the reminder

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants