Datadog's official ClickHouse integration does not work with ClickHouse Cloud. This repo fills that gap -- metrics and logs from ClickHouse Cloud into Datadog, installable in under 10 minutes.
Metrics -- all ClickHouse server metrics (queries, connections, memory, merges, storage, throughput) via the built-in Datadog OpenMetrics check. No custom code, just a config file pointing at the ClickHouse Cloud Prometheus API.
Logs -- query logs from system.query_log and server error/warning logs from system.text_log, shipped to Datadog Logs via a custom Python check. Includes automatic severity mapping:
| Source | Level | Condition |
|---|---|---|
| query_log | info |
Normal completed query |
| query_log | warning |
Duration exceeds slow query threshold |
| query_log | error |
Query failed with exception |
| text_log | warning / error / critical |
Mapped from ClickHouse Warning / Error / Fatal |
Each query log entry carries structured attributes: query_id, user, duration_ms, memory_bytes, read_rows, read_bytes, exception, tables, and more -- all searchable and facet-able in Datadog Log Explorer.
- Linux VM with Datadog Agent (v7+) installed
- ClickHouse Cloud service with a Cloud API key (key ID + secret)
- Your ClickHouse Cloud service UUID (from the console URL)
- Your ClickHouse Cloud organization ID (for metrics endpoint)
# Custom check
sudo cp checks/clickhouse_cloud.py /etc/datadog-agent/checks.d/clickhouse_cloud.py
# Log check config
sudo mkdir -p /etc/datadog-agent/conf.d/clickhouse_cloud.d
sudo cp conf.d/clickhouse_cloud.d/conf.yaml.example \
/etc/datadog-agent/conf.d/clickhouse_cloud.d/conf.yaml
# Metrics config (OpenMetrics)
sudo mkdir -p /etc/datadog-agent/conf.d/openmetrics.d
sudo cp conf.d/openmetrics.d/conf.yaml.example \
/etc/datadog-agent/conf.d/openmetrics.d/conf.yamlsudo chown dd-agent:dd-agent /etc/datadog-agent/checks.d/clickhouse_cloud.py
sudo chown dd-agent:dd-agent /etc/datadog-agent/conf.d/clickhouse_cloud.d/conf.yaml
sudo chown dd-agent:dd-agent /etc/datadog-agent/conf.d/openmetrics.d/conf.yaml
sudo chmod 644 /etc/datadog-agent/checks.d/clickhouse_cloud.py
sudo chmod 640 /etc/datadog-agent/conf.d/clickhouse_cloud.d/conf.yaml
sudo chmod 640 /etc/datadog-agent/conf.d/openmetrics.d/conf.yamlEdit /etc/datadog-agent/conf.d/clickhouse_cloud.d/conf.yaml:
init_config:
instances:
- service_id: "<your-service-uuid>"
key_id: "<your-api-key-id>"
key_secret: "<your-api-key-secret>"
cluster_name: "<your-cluster-name>" # appears as "service" in Datadog Logs
collect_query_logs: true
collect_text_logs: true
tags:
- "env:production"
- "clickhouse_cluster:<your-cluster-name>"All three credential fields are required. Everything else has sensible defaults.
| Parameter | Default | Range | Description |
|---|---|---|---|
log_batch_size |
1000 | 1 -- 10000 | Max rows fetched per check run |
slow_query_threshold_ms |
5000 | 0 -- 3600000 | Queries slower than this are logged as warning |
initial_backfill_minutes |
60 | 1 -- 1440 | How far back to look on first run (avoids flooding Datadog with history) |
query_timeout_seconds |
30 | 5 -- 300 | HTTP timeout for each ClickHouse Cloud API call |
Add one entry per service under instances in the log check config. Each instance runs independently with its own cursor, credentials, and tags:
instances:
# Production cluster
- service_id: "abc-123-prod"
key_id: "<prod-key-id>"
key_secret: "<prod-key-secret>"
cluster_name: "prod-analytics"
tags:
- "env:production"
- "clickhouse_cluster:prod-analytics"
# Staging cluster
- service_id: "def-456-staging"
key_id: "<staging-key-id>"
key_secret: "<staging-key-secret>"
cluster_name: "staging-analytics"
collect_text_logs: false # only query logs for staging
log_batch_size: 500
tags:
- "env:staging"
- "clickhouse_cluster:staging-analytics"For metrics, add a separate entry per service in the OpenMetrics config as well (each with its own org/endpoint). Use cluster_name and tags to distinguish services in Datadog dashboards and log facets.
Edit /etc/datadog-agent/conf.d/openmetrics.d/conf.yaml:
instances:
- openmetrics_endpoint: "https://api.clickhouse.cloud/v1/organizations/<your-org-id>/prometheus?filtered_metrics=true"
namespace: "clickhouse"
username: "<your-key-id>"
password: "<your-key-secret>"
tls_verify: true
honor_labels: true
metrics:
- "^ClickHouse.*"
tags:
- "env:production"
- "clickhouse_cluster:<your-cluster-name>"The default "^ClickHouse.*" collects all exposed metrics. To reduce volume, replace it with a specific allowlist (commented examples are in the template).
# Enable log collection in the main Datadog config (if not already enabled)
sudo sed -i 's/# logs_enabled: false/logs_enabled: true/' /etc/datadog-agent/datadog.yaml
sudo systemctl restart datadog-agent# Dry-run the custom check
sudo datadog-agent check clickhouse_cloud
# Check overall agent status
sudo datadog-agent statusLook for clickhouse_cloud in the Checks section and openmetrics with your ClickHouse endpoint in the output.
The Datadog Agent's built-in OpenMetrics check scrapes the ClickHouse Cloud Prometheus API on each interval. No custom code -- config only.
sequenceDiagram
participant Agent as Datadog Agent
participant CH as ClickHouse Cloud<br/>Prometheus API
participant DD as Datadog
loop Every check interval
Agent->>CH: GET /v1/organizations/{org}/prometheus
CH-->>Agent: Prometheus metrics (text)
Agent->>DD: Forward metrics via Metrics API
end
The custom Python check polls ClickHouse Cloud system tables via the Query API, tracks its position with a microsecond-precision cursor, and ships new rows to Datadog Logs.
sequenceDiagram
participant Agent as Datadog Agent<br/>(custom check)
participant Cache as Persistent Cache
participant CH as ClickHouse Cloud<br/>Query API
participant DD as Datadog
loop Every check interval
Agent->>Cache: Read last cursor (event_time_microseconds)
Cache-->>Agent: cursor value (or empty on first run)
Agent->>CH: POST /service/{id}/run<br/>SELECT ... FROM system.query_log<br/>WHERE event_time_microseconds > {cursor}
CH-->>Agent: New query_log rows (JSON)
Agent->>CH: POST /service/{id}/run<br/>SELECT ... FROM system.text_log<br/>WHERE event_time_microseconds > {cursor}
CH-->>Agent: New text_log rows (JSON)
loop For each row
Agent->>DD: send_log() with severity + structured attributes
end
Agent->>Cache: Write new cursor (last row's microsecond timestamp)
end
- At-least-once delivery -- the cursor advances only after logs are emitted. Duplicate delivery is preferred over log loss.
- Automatic retries -- transient HTTP errors (502/503/504) are retried twice with exponential backoff before reporting failure.
- No extra dependencies -- uses only
requestsandurllib3, both bundled with the Datadog Agent.
No logs appearing in Datadog
- Run
sudo datadog-agent check clickhouse_cloud-- look for errors in the output. - Confirm
logs_enabled: truein/etc/datadog-agent/datadog.yaml. - Verify your API key has access to the Cloud Query API.
No metrics appearing
- Run
sudo datadog-agent statusand look for theopenmetricscheck. - Verify the Prometheus endpoint URL, org ID, and credentials.
- Try curling the endpoint directly:
curl -u '<key_id>:<key_secret>' 'https://api.clickhouse.cloud/v1/organizations/<org-id>/prometheus?filtered_metrics=true'
Service check reporting CRITICAL
- The check cannot reach the ClickHouse Cloud Query API. Check network/firewall rules, DNS resolution, and credential validity.
Duplicate logs after agent restart
- Expected in rare cases. The cursor is persisted to disk, but if the agent crashes between emitting logs and writing the cursor, a small overlap is possible. This is by design.
Reset cursors to re-collect historical data
sudo systemctl stop datadog-agent
sudo rm -f /opt/datadog-agent/run/clickhouse_cloud*
sudo systemctl start datadog-agent