Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion openhands/usage/cli/critic.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,11 @@ For detailed information about the critic feature, including programmatic access

## What is the Critic?

The critic is an LLM-based evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions. It provides:
The critic is an LLM-based evaluator that analyzes agent actions and conversation history to predict the quality or success probability of agent decisions (see our technical report: [A Rubric-Supervised Critic from Sparse Real-World Outcomes](https://arxiv.org/abs/2603.03800) for detailed methodology).

It provides:

It provides:

- **Quality scores**: Probability scores between 0.0 and 1.0 indicating predicted success
- **Real-time feedback**: Scores computed during agent execution, not just at completion
Expand Down
2 changes: 1 addition & 1 deletion sdk/guides/critic.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
You can use critic scores to build automated workflows, such as triggering the agent to reflect on and fix its previous solution when the critic indicates poor task performance.

<Note>
This critic is a more advanced extension of the approach described in our blog post [SOTA on SWE-Bench Verified with Inference-Time Scaling and Critic Model](https://openhands.dev/blog/sota-on-swe-bench-verified-with-inference-time-scaling-and-critic-model). A technical report with detailed evaluation metrics is forthcoming.
This critic is a more advanced extension of the approach described in our blog post [SOTA on SWE-Bench Verified with Inference-Time Scaling and Critic Model](https://openhands.dev/blog/sota-on-swe-bench-verified-with-inference-time-scaling-and-critic-model). For detailed evaluation metrics and methodology, see our technical report: [A Rubric-Supervised Critic from Sparse Real-World Outcomes](https://arxiv.org/abs/2603.03800).
</Note>

## Quick Start
Expand Down Expand Up @@ -97,7 +97,7 @@

### Custom Follow-up Prompts

By default, the critic generates a generic follow-up prompt. You can customize this by subclassing `CriticBase` and overriding `get_followup_prompt()`:

Check warning on line 100 in sdk/guides/critic.mdx

View check run for this annotation

Mintlify / Mintlify Validation (allhandsai) - vale-spellcheck

sdk/guides/critic.mdx#L100

Did you really mean 'subclassing'?

```python icon="python" focus={4-12}
from openhands.sdk.critic.base import CriticBase, CriticResult
Expand Down
Loading