Skip to content

feat!: use subchart pattern for 3rd party helm charts (HDX-2025)#188

Open
dhable wants to merge 23 commits intomainfrom
dan/hdx-2025-helm-use-subchart-pattern-to-include-3rd-party-helm-charts
Open

feat!: use subchart pattern for 3rd party helm charts (HDX-2025)#188
dhable wants to merge 23 commits intomainfrom
dan/hdx-2025-helm-use-subchart-pattern-to-include-3rd-party-helm-charts

Conversation

@dhable
Copy link
Collaborator

@dhable dhable commented Mar 4, 2026

Summary

  • Remove the deprecated hdx-oss-v2 chart
  • Replace inline MongoDB templates with MongoDB Kubernetes Operator (MCK) subchart + passthrough MongoDBCommunity CR
  • Replace inline OTEL Collector templates with the official OpenTelemetry Collector Helm chart subchart
  • Replace inline ClickHouse templates with the ClickHouse Operator subchart + passthrough ClickHouseCluster and KeeperCluster CRs
  • Unify environment variables into shared clickstack-config ConfigMap and clickstack-secret Secret
  • Restructure hyperdx: values by Kubernetes resource type (deployment, service, ingress, config, secrets, tasks)
  • Centralize all credentials in hyperdx.secrets

Breaking changes

All values under mongodb, clickhouse, otel, hyperdx, and tasks have changed structure. See docs/UPGRADE.md for the complete migration guide.

Test plan

  • All 140 helm unit tests pass
  • Integration test with helm install on a Kubernetes cluster
  • Verify MongoDB, ClickHouse, and OTEL Collector operators reconcile their CRs
  • Verify HyperDX app connects to all services
  • Verify otel-collector.enabled: false properly disables the OTEL collector

Made with Cursor

@changeset-bot
Copy link

changeset-bot bot commented Mar 4, 2026

🦋 Changeset detected

Latest commit: 7cf02fb

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package
Name Type
helm-charts Major

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

dhable added 10 commits March 4, 2026 15:22
Drop the legacy hdx-oss-v2 chart and its CI workflows. All users should
migrate to the clickstack chart.

BREAKING CHANGE: The hdx-oss-v2 chart is no longer published.

Made-with: Cursor
Replace the hand-rolled MongoDB Deployment/Service/PVC templates with
the MongoDB Kubernetes Operator (MCK) as a subchart dependency. A thin
passthrough template renders the full MongoDBCommunity CRD spec from
values, giving users direct control over all CRD fields.

BREAKING CHANGE: The mongodb.* values structure has changed. MongoDB is
now managed via a MongoDBCommunity custom resource with SCRAM auth.
See mongodb.spec in values.yaml for the new configuration surface.

Made-with: Cursor
Replace the hand-rolled OTEL Collector Deployment/Service templates with
the official OpenTelemetry Collector Helm chart as a subchart dependency.
A parent-chart ConfigMap (otel-collector-env.yaml) injects dynamic
environment variables via a volume mount + shell wrapper, working around
the upstream chart's lack of tpl support on extraEnvs/extraEnvsFrom.

BREAKING CHANGE: The otel.* values structure has changed. The collector
is now configured via the otel-collector.* subchart values. Service
discovery env vars are in otel.* and rendered into a ConfigMap.

Made-with: Cursor
Replace the hand-rolled ClickHouse Deployment/Service/ConfigMaps/PVCs
and data/ XML config files with the ClickHouse Operator as a subchart
dependency. Thin passthrough templates render ClickHouseCluster and
KeeperCluster CRD specs from values, giving users full control over
all operator fields.

BREAKING CHANGE: The clickhouse.* values structure has changed.
ClickHouse is now managed via ClickHouseCluster and KeeperCluster
custom resources. See clickhouse.cluster.spec and clickhouse.keeper.spec
in values.yaml for the new configuration surface.

Made-with: Cursor
Add docs/UPGRADE.md covering the migration from inline-template chart
(v1.x) to the subchart-based architecture. Includes value mapping
tables for MongoDB, ClickHouse, and OTEL Collector, plus guidance on
fresh install vs. in-place upgrade.

Made-with: Cursor
Replace the per-component app-configmap, app-secrets, clickhouse-secrets,
and otel-collector-env with a single clickstack-config ConfigMap and
clickstack-secret Secret. Both use static names and are populated from
hyperdx.config and hyperdx.secrets values, shared by HyperDX and the
OTEL collector via envFrom.

Remove the otel: values block and env.sh shell wrapper. The subchart
condition moves to otel-collector.enabled.

BREAKING CHANGE: Environment variables are now managed via
hyperdx.config (ConfigMap) and hyperdx.secrets (Secret). The otel.*
values block has been removed. Set otel-collector.enabled to false
to disable the OTEL collector.

Made-with: Cursor
Rename mongodb-kubernetes subchart alias to mongodb-operator for
consistency. Move MongoDB password from mongodb.password to
hyperdx.secrets.MONGODB_PASSWORD so all secrets are centralized.
The mongodb-password-secret.yaml template remains as a bridge to
the MCK operator's required "password" key format.

Made-with: Cursor
Move ClickHouse user credentials from clickhouse.config.users into
hyperdx.secrets, eliminating the clickhouse.config.users block. All
credentials are now managed in a single location (hyperdx.secrets)
and shared via the clickstack-secret Secret.

Made-with: Cursor
Reorganize the hyperdx: values block by Kubernetes resource type:
- Ports shared across resources under hyperdx.ports.*
- Deployment-specific settings under hyperdx.deployment.*
- Tasks moved from top-level tasks: to hyperdx.tasks
- Remove deprecated appUrl (frontendUrl defaults to http://localhost:3000)

BREAKING CHANGE: All hyperdx.* value paths have changed. Deployment
settings (image, replicas, probes, nodeSelector, etc.) are now under
hyperdx.deployment.*. Ports are under hyperdx.ports.*. Tasks moved
from tasks.* to hyperdx.tasks.*.

Made-with: Cursor
- Rewrite UPGRADE.md to match the actual current values structure
- Update README OTEL description for shared ConfigMap/Secret
- Add charts/*/charts/*.tgz to .gitignore and remove tracked tarballs
- Reorganize templates into hyperdx/, clickhouse/, mongodb/ subdirs
- Update smoke test for operator-managed ClickHouse/MongoDB
- Create major changeset for subchart migration

Made-with: Cursor
@dhable dhable force-pushed the dan/hdx-2025-helm-use-subchart-pattern-to-include-3rd-party-helm-charts branch from 2613d82 to 0857b64 Compare March 4, 2026 21:25
dhable added 13 commits March 4, 2026 15:29
The subchart dependencies (mongodb-kubernetes, opentelemetry-collector,
clickhouse-operator-helm) must be downloaded before helm install or
helm unittest. Add dependency build step to both workflows and update
the integration test values to match the new values structure.

Made-with: Cursor
Create a new clickstack-operators chart that bundles the MongoDB and
ClickHouse operator subcharts. This must be installed before the main
clickstack chart so that CRDs are registered before CRs are created.

This fixes the Helm CRD ordering issue where operator CRDs installed
via subchart templates are not yet registered when the parent chart
tries to create custom resources in the same release.

Made-with: Cursor
These values are no longer referenced in any template since MongoDB
and ClickHouse are operator-managed. Storage class and PVC lifecycle
are now configured directly in the operator CR specs. Document PVC
retention behavior and storage class migration in README and UPGRADE
guide with links to operator docs.

Made-with: Cursor
The ClickHouse Operator does not define a 'readonly' profile by
default, causing the server to crash on startup. The app user's
read-only semantics are already enforced via grants.

Made-with: Cursor
…ice name

Service endpoints (CLICKHOUSE_ENDPOINT, MONGO_URI, OTEL_EXPORTER_OTLP_ENDPOINT,
etc.) were hardcoded in configmap.yaml via helpers, making them impossible to
override for users with external services. Move all computed endpoints into
hyperdx.config as tpl-rendered defaults so they can be overridden in values.yaml.

Also fix clickstack.clickhouse.svc helper to append "-clickhouse" suffix,
matching the actual service name the ClickHouse Operator creates.

Made-with: Cursor
…lags

The install notes referenced stale advice about using operators separately
and had incorrect disable flags. Updated to reflect the current subchart
architecture and document the correct enabled flags for each component.

Made-with: Cursor
The OTEL collector's seed step fails with DNS errors when ClickHouse
is not yet registered in CoreDNS. Add kubectl wait for ClickHouseCluster
and MongoDBCommunity readiness after helm install, giving DNS time to
propagate before the collector retries.

Also remove stale hyperdx.frontendUrl from test-values (moved to
hyperdx.config.FRONTEND_URL which defaults correctly).

Made-with: Cursor
The ClickHouse Operator only creates a headless service
({CR}-clickhouse-headless), not a regular ClusterIP service. The
previous helper generated a non-existent DNS name, causing the OTEL
collector seed step to fail with "no such host" and CrashLoopBackOff.

Made-with: Cursor
The OTLP HTTP receiver on port 4318 isn't available until the OpAMP
supervisor receives its pipeline config from the HyperDX app. Replace
the nc pre-check with curl --retry flags so data ingestion requests
retry through the startup delay instead of failing immediately.

Made-with: Cursor
kubectl port-forward terminates when the target port isn't listening
inside the pod. The OTLP HTTP receiver (4318) doesn't bind until the
OpAMP supervisor fetches its pipeline config from the HyperDX app,
which can take minutes after pod readiness.

Replace the single port-forward + curl --retry approach with a
send_otlp() helper that starts a fresh tunnel on every attempt.
Make ingestion failures non-fatal since OpAMP config propagation
may exceed the retry budget in resource-constrained CI environments.

Verified with act locally: job passes with warnings on data ingestion.

Made-with: Cursor
The OTLP data ingestion test never validated end-to-end delivery (it
only checked the HTTP status from the collector, not whether data
reached ClickHouse). Worse, the OpAMP supervisor consistently fails
to receive its pipeline config in time during CI, so the OTLP HTTP
receiver on port 4318 never starts -- making the retries burn ~6.5
minutes of dead CI time per run.

Remove the ingestion section and the 30-second data wait. The smoke
test still validates pod readiness, HyperDX UI, OTEL collector
metrics, and database CR health.

Made-with: Cursor
@dhable dhable requested review from teeohhem and wrn14897 March 5, 2026 23:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant