Understanding datadog_agent source internal metrics and logs #23897
-
QuestionHi vector team and everyone, May I check and learn more about the internal metrics and logs related to A small context is that I've just recently rolled out the Vector to receive and handle a large amount of metrics traffic from the internal datadog agents (configuring using the observability_pipelines_worker option). I've also configured quite an extensive dashboards to monitor vector and noticed periodic metric data of
So I'm trying to understand what was the real reason of the I put some of the most important configurations below to avoid too much configs. Generally, I don't see any errors or discarded events at the Thank you for spending the time, I know this is a lot of context. Vector ConfigThere are many context for the below configs though I'm open for recommendations:
data_dir: "/var/lib/vector"
sources:
datadog_agents:
type: datadog_agent
address: 0.0.0.0:9000
disable_logs: true
disable_traces: true
store_api_key: true
sinks:
datadog_platform:
inputs:
- datadog_agents
type: datadog_metrics
default_api_key: "${DD_API_KEY:?missing DD_API_KEY as default}"
buffer:
- type: memory
max_events: 400000
when_full: block
batch:
max_events: 1000
timeout_secs: 1
request:
retry_attempts: 10
retry_initial_backoff_secs: 1
retry_max_duration_secs: 10
retry_jitter_mode: Full
timeout_secs: 60
concurrency: adaptive
adaptive_concurrency:
initial_concurrency: 600
max_concurrency_limit: 1500
decrease_ratio: 1.0
rtt_deviation_scale: 3.0
ewma_alpha: 0.4Vector Logs{
"id": <obscured>
"content": {
"timestamp": "2025-10-01T10:52:09.060Z",
"tags": [
"source:undefined",
"datadog.submission_auth:api_key",
"datadog.api_key_uuid:504665cb-9c53-496d-b3c6-552e2eb73468"
],
"host": "<obscured>",
"service": "<obscured>",
"message": "Events dropped",
"attributes": {
"reason": "Source send cancelled.",
"metadata": {
"kind": "event",
"module_path": "vector_common::internal_event::component_events_dropped",
"target": "vector_common::internal_event::component_events_dropped"
},
"intentional": false,
"count": 3044,
"internal_log_rate_limit": true,
"pid": 998,
"source_type": "internal_logs",
"env": "staging",
"hostname": "<obscured>",
"appname": "<obscured>",
"service": "<obscured>",
"vector": {
"component_type": "datadog_agent",
"component_id": "datadog_agents",
"component_kind": "source"
},
"status": "ERROR",
"timestamp": 1759315929060
}
}
} |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment
-
|
Hi @techministrator, apologies for the late response. This is a very interesting thread. Regarding the event drops, the
Is data Lost? Yes. The Datadog agents receive HTTP Acknowledgements for the Did you consider reducing the batch size to potentially alleviate some ingestion issues? There is a maximum payload limit |
Beta Was this translation helpful? Give feedback.
Hi @techministrator, apologies for the late response. This is a very interesting thread.
Regarding the event drops, the
Source send cancellederrors are occurring within Vector, not at the upstream Datadog agents. Timeline:datadog_agentsourcedatadog_metricssinkbuffer.when_full = "block", Vector's buffer fills up and blocks new incoming requestsIs data Lost? Yes. The Datadog agents receive HTTP
200OKrespon…