feat!: use subchart pattern for 3rd party helm charts (HDX-2025)#188
Open
feat!: use subchart pattern for 3rd party helm charts (HDX-2025)#188
Conversation
🦋 Changeset detectedLatest commit: 7cf02fb The changes in this PR will be included in the next version bump. This PR includes changesets to release 1 package
Not sure what this means? Click here to learn what changesets are. Click here if you're a maintainer who wants to add another changeset to this PR |
Drop the legacy hdx-oss-v2 chart and its CI workflows. All users should migrate to the clickstack chart. BREAKING CHANGE: The hdx-oss-v2 chart is no longer published. Made-with: Cursor
Replace the hand-rolled MongoDB Deployment/Service/PVC templates with the MongoDB Kubernetes Operator (MCK) as a subchart dependency. A thin passthrough template renders the full MongoDBCommunity CRD spec from values, giving users direct control over all CRD fields. BREAKING CHANGE: The mongodb.* values structure has changed. MongoDB is now managed via a MongoDBCommunity custom resource with SCRAM auth. See mongodb.spec in values.yaml for the new configuration surface. Made-with: Cursor
Replace the hand-rolled OTEL Collector Deployment/Service templates with the official OpenTelemetry Collector Helm chart as a subchart dependency. A parent-chart ConfigMap (otel-collector-env.yaml) injects dynamic environment variables via a volume mount + shell wrapper, working around the upstream chart's lack of tpl support on extraEnvs/extraEnvsFrom. BREAKING CHANGE: The otel.* values structure has changed. The collector is now configured via the otel-collector.* subchart values. Service discovery env vars are in otel.* and rendered into a ConfigMap. Made-with: Cursor
Replace the hand-rolled ClickHouse Deployment/Service/ConfigMaps/PVCs and data/ XML config files with the ClickHouse Operator as a subchart dependency. Thin passthrough templates render ClickHouseCluster and KeeperCluster CRD specs from values, giving users full control over all operator fields. BREAKING CHANGE: The clickhouse.* values structure has changed. ClickHouse is now managed via ClickHouseCluster and KeeperCluster custom resources. See clickhouse.cluster.spec and clickhouse.keeper.spec in values.yaml for the new configuration surface. Made-with: Cursor
Add docs/UPGRADE.md covering the migration from inline-template chart (v1.x) to the subchart-based architecture. Includes value mapping tables for MongoDB, ClickHouse, and OTEL Collector, plus guidance on fresh install vs. in-place upgrade. Made-with: Cursor
Replace the per-component app-configmap, app-secrets, clickhouse-secrets, and otel-collector-env with a single clickstack-config ConfigMap and clickstack-secret Secret. Both use static names and are populated from hyperdx.config and hyperdx.secrets values, shared by HyperDX and the OTEL collector via envFrom. Remove the otel: values block and env.sh shell wrapper. The subchart condition moves to otel-collector.enabled. BREAKING CHANGE: Environment variables are now managed via hyperdx.config (ConfigMap) and hyperdx.secrets (Secret). The otel.* values block has been removed. Set otel-collector.enabled to false to disable the OTEL collector. Made-with: Cursor
Rename mongodb-kubernetes subchart alias to mongodb-operator for consistency. Move MongoDB password from mongodb.password to hyperdx.secrets.MONGODB_PASSWORD so all secrets are centralized. The mongodb-password-secret.yaml template remains as a bridge to the MCK operator's required "password" key format. Made-with: Cursor
Move ClickHouse user credentials from clickhouse.config.users into hyperdx.secrets, eliminating the clickhouse.config.users block. All credentials are now managed in a single location (hyperdx.secrets) and shared via the clickstack-secret Secret. Made-with: Cursor
Reorganize the hyperdx: values block by Kubernetes resource type: - Ports shared across resources under hyperdx.ports.* - Deployment-specific settings under hyperdx.deployment.* - Tasks moved from top-level tasks: to hyperdx.tasks - Remove deprecated appUrl (frontendUrl defaults to http://localhost:3000) BREAKING CHANGE: All hyperdx.* value paths have changed. Deployment settings (image, replicas, probes, nodeSelector, etc.) are now under hyperdx.deployment.*. Ports are under hyperdx.ports.*. Tasks moved from tasks.* to hyperdx.tasks.*. Made-with: Cursor
- Rewrite UPGRADE.md to match the actual current values structure - Update README OTEL description for shared ConfigMap/Secret - Add charts/*/charts/*.tgz to .gitignore and remove tracked tarballs - Reorganize templates into hyperdx/, clickhouse/, mongodb/ subdirs - Update smoke test for operator-managed ClickHouse/MongoDB - Create major changeset for subchart migration Made-with: Cursor
2613d82 to
0857b64
Compare
The subchart dependencies (mongodb-kubernetes, opentelemetry-collector, clickhouse-operator-helm) must be downloaded before helm install or helm unittest. Add dependency build step to both workflows and update the integration test values to match the new values structure. Made-with: Cursor
Made-with: Cursor
Create a new clickstack-operators chart that bundles the MongoDB and ClickHouse operator subcharts. This must be installed before the main clickstack chart so that CRDs are registered before CRs are created. This fixes the Helm CRD ordering issue where operator CRDs installed via subchart templates are not yet registered when the parent chart tries to create custom resources in the same release. Made-with: Cursor
These values are no longer referenced in any template since MongoDB and ClickHouse are operator-managed. Storage class and PVC lifecycle are now configured directly in the operator CR specs. Document PVC retention behavior and storage class migration in README and UPGRADE guide with links to operator docs. Made-with: Cursor
The ClickHouse Operator does not define a 'readonly' profile by default, causing the server to crash on startup. The app user's read-only semantics are already enforced via grants. Made-with: Cursor
Made-with: Cursor
…ice name Service endpoints (CLICKHOUSE_ENDPOINT, MONGO_URI, OTEL_EXPORTER_OTLP_ENDPOINT, etc.) were hardcoded in configmap.yaml via helpers, making them impossible to override for users with external services. Move all computed endpoints into hyperdx.config as tpl-rendered defaults so they can be overridden in values.yaml. Also fix clickstack.clickhouse.svc helper to append "-clickhouse" suffix, matching the actual service name the ClickHouse Operator creates. Made-with: Cursor
…lags The install notes referenced stale advice about using operators separately and had incorrect disable flags. Updated to reflect the current subchart architecture and document the correct enabled flags for each component. Made-with: Cursor
The OTEL collector's seed step fails with DNS errors when ClickHouse is not yet registered in CoreDNS. Add kubectl wait for ClickHouseCluster and MongoDBCommunity readiness after helm install, giving DNS time to propagate before the collector retries. Also remove stale hyperdx.frontendUrl from test-values (moved to hyperdx.config.FRONTEND_URL which defaults correctly). Made-with: Cursor
The ClickHouse Operator only creates a headless service
({CR}-clickhouse-headless), not a regular ClusterIP service. The
previous helper generated a non-existent DNS name, causing the OTEL
collector seed step to fail with "no such host" and CrashLoopBackOff.
Made-with: Cursor
The OTLP HTTP receiver on port 4318 isn't available until the OpAMP supervisor receives its pipeline config from the HyperDX app. Replace the nc pre-check with curl --retry flags so data ingestion requests retry through the startup delay instead of failing immediately. Made-with: Cursor
kubectl port-forward terminates when the target port isn't listening inside the pod. The OTLP HTTP receiver (4318) doesn't bind until the OpAMP supervisor fetches its pipeline config from the HyperDX app, which can take minutes after pod readiness. Replace the single port-forward + curl --retry approach with a send_otlp() helper that starts a fresh tunnel on every attempt. Make ingestion failures non-fatal since OpAMP config propagation may exceed the retry budget in resource-constrained CI environments. Verified with act locally: job passes with warnings on data ingestion. Made-with: Cursor
The OTLP data ingestion test never validated end-to-end delivery (it only checked the HTTP status from the collector, not whether data reached ClickHouse). Worse, the OpAMP supervisor consistently fails to receive its pipeline config in time during CI, so the OTLP HTTP receiver on port 4318 never starts -- making the retries burn ~6.5 minutes of dead CI time per run. Remove the ingestion section and the 30-second data wait. The smoke test still validates pod readiness, HyperDX UI, OTEL collector metrics, and database CR health. Made-with: Cursor
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
hdx-oss-v2chartMongoDBCommunityCRClickHouseClusterandKeeperClusterCRsclickstack-configConfigMap andclickstack-secretSecrethyperdx:values by Kubernetes resource type (deployment, service, ingress, config, secrets, tasks)hyperdx.secretsBreaking changes
All values under
mongodb,clickhouse,otel,hyperdx, andtaskshave changed structure. See docs/UPGRADE.md for the complete migration guide.Test plan
helm installon a Kubernetes clusterotel-collector.enabled: falseproperly disables the OTEL collectorMade with Cursor