-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Summary
PR #116 fixes #113 by adding async generator support to @control().
That fix is useful and pragmatic, but it implements only one streaming model: post-facto evaluation. Chunks are yielded to the caller in real time, and the post-stage control check runs only after the stream completes normally.
This issue tracks two follow-ups:
- Document and test the current post-facto behavior clearly.
- Design opt-in support for additional streaming evaluation modes.
Background
There are three broad ways to evaluate streamed LLM responses, each with a different safety vs latency tradeoff.
| Mode | Behavior | Safety | UX | Complexity |
|---|---|---|---|---|
buffer |
Hold the full response, evaluate, then release | Strong | Poor for streaming | Low |
incremental |
Evaluate during streaming as output arrives | Medium | Good | High |
post_facto |
Stream immediately, evaluate after completion | Weak for blocking, useful for audit/redaction | Good | Low |
PR #116 implements post_facto semantics for the @control() decorator path.
Current Behavior
@control()now works on async generator functions and no longer crashes on common streaming patterns.- Pre-stage controls run before the first chunk is yielded.
- If a pre-stage control denies execution, no chunks are yielded.
- Streamed chunks are yielded immediately as they are produced.
- Post-stage controls run on the accumulated output only after normal stream completion.
- If the caller stops consuming early, disconnects, or the task is cancelled, no final post-stage evaluation runs.
- Users can manually implement streaming-time evaluation today by calling the public
agent_control.evaluate_controls()API inside their own stream loop.
This behavior is reasonable for parity with the existing pre/post execution model, but it should be explicit in docs and tests.
Existing Manual Workaround
Applications that own the streaming loop can already perform custom streaming evaluation without SDK changes by calling agent_control.evaluate_controls() directly during generation.
That is useful, but it is not the same as first-class SDK support:
@control()does not perform incremental evaluation automatically.- The application must decide buffering strategy, evaluation cadence, and deny/steer handling.
- The current framework-native integrations do not yet provide a built-in first-class incremental enforcement path
Integration Support Today
Streaming evaluation support currently varies by integration method:
@control()decorator: supports pre-stage blocking and post-facto post-stage evaluation after#116.- Framework-native integrations: support varies by framework.
evaluate_controls()API: can be used manually for custom streaming evaluation in application-owned stream loops.
Why This Matters
For many LLM applications, streaming is the primary response path. Post-facto evaluation is not sufficient for strict blocking guardrails, because output has already been emitted before the post-stage result is known.
Post-facto evaluation is still useful for:
- audit logging
- retroactive UI redaction
- analytics and observability
Applications that need pre-emission safety guarantees need either buffer or incremental modes.
Phase 1: Document and Harden Current Behavior
Small follow-up work after #116:
- Add a test proving that partial stream consumption skips the post-stage check.
- Add an inline comment in the async-generator wrapper noting that post-stage evaluation only runs after full consumption.
- Update the
control()docstring to clarify that streaming post-stage checks are post-facto, not pre-yield enforcement.
Phase 2: Design Opt-In Streaming Evaluation Modes
Longer term, users may need a choice of streaming behavior rather than a single default.
Possible direction:
@control(streaming="buffer")
@control(streaming="incremental")
@control(streaming="post_facto")The exact API is open. The important design question is whether Agent Control should support multiple streaming modes explicitly and, if so, which ones.
Constraints For Incremental Evaluation
Incremental streaming evaluation is the most attractive mode from a UX perspective, but it has significant constraints.
- SDK-side vs server-side execution: server-side evaluation on every chunk would add substantial network, latency, cost, and telemetry overhead.
- Structured vs unstructured output: partial JSON, SQL, XML, function-call payloads, and similar structured outputs are hard to evaluate meaningfully mid-stream.
- Sliding context window: evaluating only the latest chunk misses cross-chunk patterns, while evaluating the full accumulated output on every step is repetitive and expensive. A more practical pattern is to evaluate the latest chunk plus a bounded amount of prior context.
- Control eligibility: not every evaluator or control is meaningful in incremental mode, so the feature likely needs an explicit capability boundary.
- Enforcement semantics: the design has to define when checks happen and what happens on deny or steer after some output has already been emitted.
- Observability and dedupe: repeated checks on overlapping text can create duplicate events, duplicate matches, and noisy logs unless the behavior is defined carefully.
- Chunk normalization: some frameworks stream plain text, while others stream typed delta objects, partial tool calls, or partial structured payloads. Incremental evaluation likely needs a normalization layer before control execution.
Scope Limitations
At least for an initial version, any incremental mode would likely need to be:
- opt-in
- limited to SDK-executed controls
- limited to text output
- limited to the
@control()decorator path
The current framework-native integrations would require integration-specific work for true incremental enforcement. Some frameworks already expose partial-response hooks, while others expose only final post-model hooks, so the implementation path is not uniform across integrations.
Acceptance Criteria
Phase 1:
- Partial-consumption behavior is covered by a test.
- Streaming post-facto semantics are documented in code and public docs.
Phase 2:
- An RFC or design doc defines the supported streaming modes and proposed API surface.
- The team decides which modes will be supported in the first implementation.
- Supported and unsupported scenarios are documented explicitly.
- If incremental mode is pursued, its eligibility rules and enforcement semantics are defined before implementation.
Non-Goals
- Re-opening the async-generator crash fixed by
#116. - Changing the meaning of batch pre/post execution checks.
- Claiming full real-time streamed-output blocking before a concrete design exists.