DAG Evaluator by superdosh · Pull Request #114 · mlcommons/modelplane

superdosh · 2026-04-03T18:42:11Z

Building blocks for building DAG-like evaluators. You can see how it's used here: https://github.com/mlcommons/modelplane-flights/pull/6

github-actions · 2026-04-03T18:42:20Z

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

bkorycki

This looks great! I really appreciate all the comments. It made the code easy to follow.

Sorry for all the comments. Mostly just clarification questions and nit picky suggestions.

bkorycki · 2026-04-03T19:32:07Z

src/modelplane/evaluator/dag.py

+        arbiter          = MyArbiter("Arbiter", routes_true=[VIOLATING], routes_false=[NONVIOLATING])
+
+        dag = (
+            EvaluatorDAG("refusal_evaluator", outputs=[NONVIOLATING, VIOLATING])


The refusal is just the first component in the example right? So I think "safety_evaluator" might be a better name.

bkorycki · 2026-04-03T19:32:19Z

src/modelplane/evaluator/dag.py

+class EvaluatorDAG:
+    """DAG of EvaluatorNodes.
+
+    Usage:


I appreciate this documentation

bkorycki · 2026-04-03T19:35:06Z

src/modelplane/evaluator/dag.py

+
+        if node.name in self._all_names():
+            raise ValueError(
+                f"A different node named {node.name!r} is already registered."


Ooh what does !r do?
Also maybe it would be more precise to say "a different node or output type..."

bkorycki · 2026-04-03T19:39:06Z

src/modelplane/evaluator/dag.py

+
+        Build:
+        - _predecessors: dict mapping node name to list of parent node names (for context during execution)
+        - _root_nodes: list of node names with no incoming routes (starting points)


What would be an example where someone would need more than 1 root node?

bkorycki · 2026-04-03T19:41:53Z

src/modelplane/evaluator/nodes.py

+        routes: Optional[list[str | Output]] = None,
+    ) -> None:
+        self.name = name
+        self.routes_true = routes_true or []


Why not just make [] the default?

bkorycki · 2026-04-03T20:26:45Z

src/modelplane/evaluator/nodes.py

+    def __init__(
+        self,
+        name: str,
+        routes_true: Optional[list[str | Output]] = None,


Why would something need to route to multiple nodes?

bkorycki · 2026-04-03T20:30:54Z

src/modelplane/evaluator/dag.py

+        root_nodes = [n for n in self._nodes if in_degree[n] == 0]
+        queue = collections.deque(root_nodes)
+        ordered: list[str] = []
+        while queue:


holy leetcode flashbacks

bkorycki · 2026-04-03T20:35:08Z

src/modelplane/evaluator/dag.py

+        if len(ordered) != len(self._nodes):
+            # missing nodes
+            missing = set(self._nodes) - set(ordered)
+            raise ValueError(f"Graph contains a cycle. Missing nodes: {missing}")


Maybe some unit tests for this validation code would be good.

bkorycki · 2026-04-03T20:38:33Z

src/modelplane/evaluator/dag.py

+
+        # check all terminal nodes are Output nodes
+        terminal_nodes = [n for n in self._nodes if not all_routes.get(n)]
+        for terminal in terminal_nodes:


It's confusing to me that a terminal node can either be an Output object or an Arbiter object which ... routes to Output object(s)?

bkorycki · 2026-04-03T20:42:46Z

src/modelplane/evaluator/dag.py

+        traversed_edges: Optional[set[tuple[str, str]]] = None,
+        final_output: Optional[Output] = None,
+    ):
+        """Render the DAG as a PNG image. In a Jupyter notebook the image is displayed inline.


superdosh added 2 commits April 2, 2026 16:19

Initial DAG evaluator commit.

b66aca1

Fixes.

f8534d9

superdosh added 2 commits April 3, 2026 14:42

Refactor EvalContext to standardize prompt attribute naming

a966cab

Nice visualization plus other fixes.

dcd1d0e

superdosh force-pushed the dag-evaluator branch from 0db4854 to dcd1d0e Compare April 3, 2026 18:42

superdosh marked this pull request as ready for review April 3, 2026 18:45

superdosh requested a review from a team as a code owner April 3, 2026 18:45

superdosh requested review from bkorycki and bollacker April 3, 2026 18:45

bkorycki reviewed Apr 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DAG Evaluator#114

DAG Evaluator#114
superdosh wants to merge 4 commits intomainfrom
dag-evaluator

superdosh commented Apr 3, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 3, 2026 •

edited

Loading

Uh oh!

bkorycki left a comment

Uh oh!

bkorycki Apr 3, 2026

Uh oh!

bkorycki Apr 3, 2026

Uh oh!

bkorycki Apr 3, 2026

Uh oh!

bkorycki Apr 3, 2026 •

edited

Loading

Uh oh!

bkorycki Apr 3, 2026

Uh oh!

bkorycki Apr 3, 2026 •

edited

Loading

Uh oh!

bkorycki Apr 3, 2026

Uh oh!

bkorycki Apr 3, 2026

Uh oh!

bkorycki Apr 3, 2026

Uh oh!

bkorycki Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

superdosh commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bkorycki left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bkorycki Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bkorycki Apr 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

superdosh commented Apr 3, 2026 •

edited

Loading

github-actions bot commented Apr 3, 2026 •

edited

Loading

bkorycki Apr 3, 2026 •

edited

Loading

bkorycki Apr 3, 2026 •

edited

Loading