-
Notifications
You must be signed in to change notification settings - Fork 77
Description
Priority Level
Medium (Nice to have)
Is your feature request related to a problem? Please describe.
Downstream tools and pipelines that build on Data Designer currently have no programmatic way to access per-call telemetry (model used, token counts, latency, retries, failure reasons). The only signals available are human-readable log messages, which means consumers must either:
- Parse log strings with brittle regex to extract stats
- Set DD's log level to INFO/DEBUG and hope the right messages propagate
- Infer failures indirectly (e.g. by diffing input vs output record IDs)
This makes it difficult to build observability dashboards, compute cost estimates, track model usage across runs, or diagnose slow pipelines — all without resorting to fragile log parsing.
Describe the solution you'd like
Attach structured metadata to Python LogRecord objects via the extra parameter at key call sites. This requires no new dependencies and no changes to existing log output — the human-readable messages stay exactly as they are.
Concretely, add a namespaced extra dict (e.g. dd_event) to log calls at these sites:
- Per-LLM call — model alias, input/output tokens, latency, retries, status (success/failed/filtered), error message if any
- Record failure — record ID, column, failure reason, attempt number
- Model usage summary — aggregate token counts and request counts per model at workflow end
Example:
# Existing log line stays unchanged for human readers
logger.debug(
"LLM call to %s completed in %dms",
model_alias, latency_ms,
extra={"dd_event": {
"type": "llm_call",
"model": model_alias,
"column": column_name,
"record_id": record_id,
"input_tokens": 285,
"output_tokens": 130,
"latency_ms": 340,
"retries": 0,
"status": "success",
}},
)Consumers attach a lightweight handler to collect events:
class EventCollector(logging.Handler):
def __init__(self):
super().__init__()
self.events = []
def emit(self, record):
event = getattr(record, "dd_event", None)
if event:
self.events.append(event)This enables several use cases without DD needing to expose new public API:
- Live monitoring — attach a handler during
create()/preview()to stream events as they happen - Offline analytics — write events to JSONL for post-hoc cost analysis, latency profiling, or model comparison
- Observability integration — forward events to Prometheus, OpenTelemetry, or custom dashboards
- Debugging — filter events by record ID to trace a single record's journey through the pipeline
Describe alternatives you've considered
-
Structured fields on
CreateResult/PreviewResult(e.g.result.model_usage,result.failed_records). This would be cleaner for aggregate stats but doesn't support live streaming and requires new public API surface. Could complement the logging approach for summary data. -
Callback / event hook parameter on
create()/preview()(e.g.on_event=my_handler). More explicit contract with typed event dataclasses, but a larger API change and less idiomatic Python than logging. -
Adopting
structlogfor structured logging throughout DD. Powerful but adds a dependency and is a bigger architectural change. Theextraapproach is a stepping stone that's compatible with a future structlog migration.
Additional context
Using a single namespaced key (dd_event) rather than flat extra keys avoids collisions with LogRecord's built-in attributes and with any other libraries that use extra. The convention is simple: if dd_event is present on a record, it contains a typed dict with a "type" discriminator field.
This is a non-breaking, additive change — existing users who don't attach a custom handler see zero difference in behavior or output.