Skip to content

NeuroverseOS/neuroverse-simulations

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

192 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NV-SIM

Change the rules. See why the system changed.

npx @neuroverseos/nv-sim visualize

Put Governance Inside Your Agent Loop — One Line

Your agents already have a decide → act loop. Insert one call between them:

Before:  action = agent.decide()
After:   action = govern(agent.decide())

Option A: Pipe mode (any language, zero SDK)

my_agent | neuroverse guard --world ./world --trace

Every action your agent emits gets evaluated. Blocked actions return {"status":"BLOCK","reason":"..."}. Allowed actions pass through. Your agent reads the verdict and adapts.

Option B: HTTP call (any framework)

npx @neuroverseos/nv-sim serve
for agent in agents:
    action = agent.decide()

    verdict = requests.post("http://localhost:3456/api/evaluate", json={
        "actor": agent.id,
        "action": action,
    }).json()

    if verdict["decision"] == "BLOCK":
        action = "hold"
    elif verdict["decision"] == "MODIFY":
        action = verdict["modified_action"]

    environment.apply(agent, action)

Option C: Direct import (TypeScript/JavaScript)

import { evaluateGuard, loadWorld } from '@neuroverseos/governance';

const world = await loadWorld('./world/');
const verdict = evaluateGuard({ intent, tool, scope }, world);
if (verdict.status === 'BLOCK') throw new Error(`Blocked: ${verdict.reason}`);

Option D: MCP server (Claude, Cursor, Windsurf)

neuroverse mcp --world ./world --plan plan.json

One command. Same rules govern your agents whether you're simulating or shipping.


The Problem

You build a multi-agent system. You run it. You get metrics — loss curves, reward signals, completion rates. Something goes wrong, and you ask:

Why did the agents do that?

Nobody can tell you. Metrics say what happened. Logs say when. Nothing tells you why agents changed their behavior — which rule caused it, which agents shifted first, what strategy they abandoned, and what they replaced it with.

Most multi-agent systems let agents do whatever emerges. NeuroVerse lets you decide what's allowed — and actually stops the rest.

How It Works

Step 1: Describe what matters (plain English)

What should agents explore?

"Protein mutations that improve binding affinity for SSTR2"

What should never be published?

"Results based on a single data source. Claims with confidence below 70%."

What makes a result valuable?

"Multiple independent lines of evidence converging on the same finding."

How should agents be rewarded or penalized?

IF an agent publishes without peer validation → reduce its influence for 3 rounds IF two agents independently converge on the same finding → boost that finding's priority

Step 2: Build World — rules become enforceable

Click "Build World" and your plain English becomes enforceable logic:

BLOCK       Results with confidence below 70%
BLOCK       Results from a single source
PRIORITIZE  Multi-source convergence
PENALIZE    Publishing without validation → reduce influence, 3 rounds
REWARD      Independent convergence → boost priority, 5 rounds

Works without AI. The deterministic engine uses heuristics to translate your intent into rules — no API key, no cost, no cloud.

Better with AI. Add your API key (Anthropic, OpenAI, Google, Groq, or any OpenAI-compatible endpoint) and the engine generates smarter, more specific rules. Key stays in your browser — never sent to our servers.

If your policy has conflicts, the inline diagnostics show you exactly what's wrong with one-click fix buttons:

ERROR  Conflicting rules: RULE-002 vs RULE-003
       Fix: Resolve conflict — remove one rule, or add a condition
       [Merge into single rule]  ← click to fix

Step 3: Evaluate output — see what holds up

Upload your agent output (JSONL) or load demo data. The engine evaluates every action against your rules and shows:

  • Per-action verdicts — ALLOW, BLOCK, MODIFY, PAUSE, REWARD, or PENALIZE
  • Audit trail — per-agent breakdown, rule firing frequency, timeline by cycle
  • Behavioral insights — two columns side by side:
Observed (from your data) Requires Integration (blind spots)
Agent X fails 75% of the time Did Agent X change strategy after being blocked?
"No sources" triggered 8x across 3 agents Is this systemic or isolated?
Agents A and B produced identical output Independent convergence or echo amplification?
Quality degrading over 5 cycles Drift or deliberate strategy shift?

The left column is computed from real audit data. The right column tells you what you can only answer by putting governance inside the loop.

Step 4: Change one rule. Run again.

Remove the confidence threshold. What breaks? Add a rule penalizing groupthink. Do agents explore more diverse hypotheses?

This is the experiment. Not the simulation — the rules themselves.

Install

npm install
npm run dev:full

Opens in your browser. Everything runs locally. Light and dark mode included.

Governance engine (standalone)

npm install @neuroverseos/governance

The governance engine is a separate open-source package. The simulation UI uses it, but you can use it independently in any system.

Validate Your Policy (CLI)

The governance package includes validation that runs the same checks as the browser UI:

# Initialize a world definition
neuroverse init --name "my-research-agents"

# Validate your world (9 static analysis checks)
neuroverse validate --world ./world

# Run 14 standard guard simulations + fuzz testing
neuroverse test --world ./world

# Red team: 28 adversarial attacks across 6 categories
neuroverse redteam --world ./world

Validation checks: structural completeness, referential integrity, guard coverage, gate consistency, kernel alignment, guard shadowing, reachability analysis, state space coverage, and governance health scoring.

Integration — Put This In Your Loop

Pipe mode (any language)

echo '{"intent":"delete user data"}' | neuroverse guard --world ./world --trace
# → {"status":"BLOCK","reason":"...","ruleId":"..."}

Pipe your agent's output through neuroverse guard. Every action gets evaluated. Works with Python, Rust, Go, shell scripts — anything that writes to stdout.

# Govern a Python agent
python my_agent.py | neuroverse run --world ./world --plan plan.json

# Interactive governed chat
neuroverse run --interactive --world ./world --provider openai --plan plan.json

HTTP mode (any framework)

npx @neuroverseos/nv-sim serve --port 3456
POST /api/evaluate
  Body:    { actor, action, payload?, state?, world? }
  Returns: { decision: ALLOW|BLOCK|MODIFY, reason, evidence }
# Zero-dependency test
curl -X POST http://localhost:3456/api/evaluate \
  -H "Content-Type: application/json" \
  -d '{"actor":"agent_1","action":"panic_sell","world":"trading"}'

Direct import (TypeScript)

import { evaluateGuard, loadWorld } from '@neuroverseos/governance';

const world = await loadWorld('./world/');

for (const agent of agents) {
  const action = agent.decide();
  const verdict = evaluateGuard({ intent: action.intent, tool: action.tool, scope: action.scope }, world);

  if (verdict.status === 'BLOCK') {
    agent.retry(verdict.reason);
  } else {
    agent.execute(action);
  }
}

MCP server (Claude Code, Cursor, Windsurf)

neuroverse mcp --world ./world --plan plan.json

Your IDE's AI assistant becomes a governed agent. Same rules, same verdicts.

Plan management

neuroverse plan compile plan.md --output plan.json
neuroverse plan check --plan plan.json
neuroverse plan advance step_id --plan plan.json --evidence type --proof url

Engine Profiles

The simulation UI ships with pre-built profiles for common agent systems:

Engine What It Governs Example
ScienceClaw Research agents Block synthesis with no papers, penalize unsourced claims
MiroFish / OASIS Social simulation Limit influence concentration, dampen sentiment spirals
LangChain / LangGraph LLM agent chains Cap tool calls, require validation before output
Custom Any system Auto-detects field mappings from your JSONL

Each profile maps your system's output format to the governance engine's action schema automatically.

AI Providers — Bring Your Own Model

AI is optional. The deterministic engine runs on math, not tokens. When you bring your own model, AI becomes a governed actor — subject to the same rules as every other agent.

Provider Key / Env Var Auto-detected
Anthropic (Claude) sk-ant-* / ANTHROPIC_API_KEY Yes
OpenAI sk-* / OPENAI_API_KEY Yes
Google (Gemini) AIza* Yes
Groq gsk_* / GROQ_API_KEY Yes
Together TOGETHER_API_KEY Yes
Mistral MISTRAL_API_KEY Yes
Deepseek DEEPSEEK_API_KEY Yes
Fireworks FIREWORKS_API_KEY Yes
Ollama OLLAMA_BASE_URL Yes
Local LLM LOCAL_LLM_URL Yes
(none) Deterministic fallback

In the browser UI, click the sparkle icon in the header and paste your key. It's stored in localStorage only.

Any endpoint that speaks the OpenAI chat completions format (POST /v1/chat/completions) works.

What You Get That Nothing Else Gives You

Most simulation tools answer: "What will happen?"

NV-SIM answers: "What changes when I change the rules — and why?"

What You Get What It Proves
Behavioral shifts Before → after for every agent, with percentages
Causal explanation Why agents changed — traced to specific rules
Behavioral insights Output tendencies, echo detection, drift, pattern clustering
Blind spot analysis What you can observe vs. what requires loop integration
Full audit trail Every decision, every rule, every adaptation — JSONL

The output is narrative, not metrics. Not "40% adjusted actions" — but "40% shifted from aggressive to conservative strategies after early attempts failed."

Commands

Command What It Does
nv-sim visualize Interactive control platform
nv-sim enforce [preset] [rules.txt] Policy enforcement lab
nv-sim compare [preset] Baseline vs governed simulation
nv-sim scenario <id> Run a named stress scenario
nv-sim serve --port N Governance runtime (HTTP API)
nv-sim world-from-doc rules.txt Generate world from plain English
nv-sim chaos --runs N Stress test (randomized scenarios)
neuroverse guard --world ./world Pipe-mode evaluation
neuroverse validate --world ./world 9 static analysis checks
neuroverse test --world ./world 14 guard simulations + fuzz
neuroverse redteam --world ./world 28 adversarial attacks
neuroverse playground --world ./world Interactive web UI (localhost:4242)
neuroverse mcp --world ./world MCP server for IDE integration

Architecture

@neuroverseos/governance    ← deterministic rule engine (npm, open source)
        ↓
    nv-sim engine           ← world rules + narrative injection + swarm simulation
        ↓
    behavioral analysis     ← shift detection, echo detection, drift tracking
        ↓
    behavioral insights     ← observed signals vs. integration blind spots
        ↓
    audit trail             ← append-only evidence chain (JSONL)
        ↓
    nv-sim CLI + UI         ← scenarios, comparison, governance runtime, control platform
        ↓
    AI providers (optional) ← BYOM: Anthropic, OpenAI, Groq, local LLMs, or none

Everything runs locally. No cloud. No accounts. No cost.

License

Apache 2.0


Change the rules. See why the system changed.

@neuroverseos