Streaming control evaluation: current behavior and future modes

## Summary

PR [#116](https://github.com/agentcontrol/agent-control/pull/116) fixes [#113](https://github.com/agentcontrol/agent-control/issues/113) by adding async generator support to `@control()`.

That fix is useful and pragmatic, but it implements only one streaming model: post-facto evaluation. Chunks are yielded to the caller in real time, and the post-stage control check runs only after the stream completes normally.

This issue tracks two follow-ups:

1. Document and test the current post-facto behavior clearly.
2. Design opt-in support for additional streaming evaluation modes.

## Background

There are three broad ways to evaluate streamed LLM responses, each with a different safety vs latency tradeoff.

| Mode | Behavior | Safety | UX | Complexity |
| --- | --- | --- | --- | --- |
| `buffer` | Hold the full response, evaluate, then release | Strong | Poor for streaming | Low |
| `incremental` | Evaluate during streaming as output arrives | Medium | Good | High |
| `post_facto` | Stream immediately, evaluate after completion | Weak for blocking, useful for audit/redaction | Good | Low |

PR `#116` implements `post_facto` semantics for the `@control()` decorator path.

## Current Behavior

- `@control()` now works on async generator functions and no longer crashes on common streaming patterns.
- Pre-stage controls run before the first chunk is yielded.
- If a pre-stage control denies execution, no chunks are yielded.
- Streamed chunks are yielded immediately as they are produced.
- Post-stage controls run on the accumulated output only after normal stream completion.
- If the caller stops consuming early, disconnects, or the task is cancelled, no final post-stage evaluation runs.
- Users can manually implement streaming-time evaluation today by calling the public `agent_control.evaluate_controls()` API inside their own stream loop.

This behavior is reasonable for parity with the existing pre/post execution model, but it should be explicit in docs and tests.

## Existing Manual Workaround

Applications that own the streaming loop can already perform custom streaming evaluation without SDK changes by calling `agent_control.evaluate_controls()` directly during generation.

That is useful, but it is not the same as first-class SDK support:

- `@control()` does not perform incremental evaluation automatically.
- The application must decide buffering strategy, evaluation cadence, and deny/steer handling.
- The current framework-native integrations do not yet provide a built-in first-class incremental enforcement path

## Integration Support Today

Streaming evaluation support currently varies by integration method:

- `@control()` decorator: supports pre-stage blocking and post-facto post-stage evaluation after `#116`.
- Framework-native integrations: support varies by framework.
- `evaluate_controls()` API: can be used manually for custom streaming evaluation in application-owned stream loops.

## Why This Matters

For many LLM applications, streaming is the primary response path. Post-facto evaluation is not sufficient for strict blocking guardrails, because output has already been emitted before the post-stage result is known.

Post-facto evaluation is still useful for:

- audit logging
- retroactive UI redaction
- analytics and observability

Applications that need pre-emission safety guarantees need either `buffer` or `incremental` modes.

## Phase 1: Document and Harden Current Behavior

Small follow-up work after `#116`:

- Add a test proving that partial stream consumption skips the post-stage check.
- Add an inline comment in the async-generator wrapper noting that post-stage evaluation only runs after full consumption.
- Update the `control()` docstring to clarify that streaming post-stage checks are post-facto, not pre-yield enforcement.

## Phase 2: Design Opt-In Streaming Evaluation Modes

Longer term, users may need a choice of streaming behavior rather than a single default.

Possible direction:

```python
@control(streaming="buffer")
@control(streaming="incremental")
@control(streaming="post_facto")
```

The exact API is open. The important design question is whether Agent Control should support multiple streaming modes explicitly and, if so, which ones.

## Constraints For Incremental Evaluation

Incremental streaming evaluation is the most attractive mode from a UX perspective, but it has significant constraints.

- SDK-side vs server-side execution: server-side evaluation on every chunk would add substantial network, latency, cost, and telemetry overhead.
- Structured vs unstructured output: partial JSON, SQL, XML, function-call payloads, and similar structured outputs are hard to evaluate meaningfully mid-stream.
- Sliding context window: evaluating only the latest chunk misses cross-chunk patterns, while evaluating the full accumulated output on every step is repetitive and expensive. A more practical pattern is to evaluate the latest chunk plus a bounded amount of prior context.
- Control eligibility: not every evaluator or control is meaningful in incremental mode, so the feature likely needs an explicit capability boundary.
- Enforcement semantics: the design has to define when checks happen and what happens on deny or steer after some output has already been emitted.
- Observability and dedupe: repeated checks on overlapping text can create duplicate events, duplicate matches, and noisy logs unless the behavior is defined carefully.
- Chunk normalization: some frameworks stream plain text, while others stream typed delta objects, partial tool calls, or partial structured payloads. Incremental evaluation likely needs a normalization layer before control execution.

## Scope Limitations

At least for an initial version, any incremental mode would likely need to be:

- opt-in
- limited to SDK-executed controls
- limited to text output
- limited to the `@control()` decorator path

The current framework-native integrations would require integration-specific work for true incremental enforcement. Some frameworks already expose partial-response hooks, while others expose only final post-model hooks, so the implementation path is not uniform across integrations.

## Acceptance Criteria

Phase 1:

- Partial-consumption behavior is covered by a test.
- Streaming post-facto semantics are documented in code and public docs.

Phase 2:

- An RFC or design doc defines the supported streaming modes and proposed API surface.
- The team decides which modes will be supported in the first implementation.
- Supported and unsupported scenarios are documented explicitly.
- If incremental mode is pursued, its eligibility rules and enforcement semantics are defined before implementation.

## Non-Goals

- Re-opening the async-generator crash fixed by `#116`.
- Changing the meaning of batch pre/post execution checks.
- Claiming full real-time streamed-output blocking before a concrete design exists.

## Related

- Issue: [#113](https://github.com/agentcontrol/agent-control/issues/113)
- PR: [#116](https://github.com/agentcontrol/agent-control/pull/116)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Streaming control evaluation: current behavior and future modes #126

Summary

Background

Current Behavior

Existing Manual Workaround

Integration Support Today

Why This Matters

Phase 1: Document and Harden Current Behavior

Phase 2: Design Opt-In Streaming Evaluation Modes

Constraints For Incremental Evaluation

Scope Limitations

Acceptance Criteria

Non-Goals

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Mode	Behavior	Safety	UX	Complexity
`buffer`	Hold the full response, evaluate, then release	Strong	Poor for streaming	Low
`incremental`	Evaluate during streaming as output arrives	Medium	Good	High
`post_facto`	Stream immediately, evaluate after completion	Weak for blocking, useful for audit/redaction	Good	Low

Streaming control evaluation: current behavior and future modes #126

Description

Summary

Background

Current Behavior

Existing Manual Workaround

Integration Support Today

Why This Matters

Phase 1: Document and Harden Current Behavior

Phase 2: Design Opt-In Streaming Evaluation Modes

Constraints For Incremental Evaluation

Scope Limitations

Acceptance Criteria

Non-Goals

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions