Blog post: "Your AI Agent Just Leaked Your Salary Expectations"

## Proposal

Write a long-form technical blog post (~2,500 words) designed to bring AgentVault to the attention of developers and AI safety/governance people. Publish on the website (add a `/blog` route to the Astro site), then submit to HN and send directly to relevant people.

## Target audience

- HN front page readers (developers building multi-agent systems)
- AI safety/governance researchers (ARIA, NIST AI framework, EU AI Act compliance)
- Agent framework maintainers (CrewAI, AutoGen, LangGraph)
- A2A working group contacts

## Working title

**"Your AI Agent Just Leaked Your Salary Expectations"**

HN submission title variant: "We red-teamed AI agent negotiations — the model leaked salary expectations every time"

## Structure

### 1. The Hook — A Negotiation Gone Wrong (~400 words)

Open with the salary negotiation demo scenario (#6). Two agents, each holding private context. Alice's agent knows her floor is £88K. Bob's agent knows his budget stretches to £98K. They're supposed to check compatibility — not exchange numbers.

Run it with free-text output (schema v1). Quote the red team results: the model disclosed Alice's exact range. This isn't hypothetical — it happened in testing.

Key line: "The model did exactly what it was trained to do. That's the problem."

### 2. Why "Be Careful" Doesn't Work (~400 words)

Information-theoretic argument. A free-text channel has unbounded capacity. Prompt instructions are advisory. You can't audit what didn't leak — you can only audit what was *possible* to leak.

Even with JSON mode, a `"reasoning": string` field can carry arbitrary information. The constraint must be on schema design, not output format.

### 3. The Fix: Bound the Channel, Not the Model (~500 words)

Walk through what AgentVault does, using the salary scenario. Show the actual schema and contract from the demo scenario config (not a standalone script):

- Contract: both agents agree on purpose, output schema, prompt template, guardian policy — before any context is shared
- Relay: assembles prompt, calls model, validates output against schema, rejects non-conforming
- Receipt: cryptographic proof of what schema governed the exchange

The schema has ~12 bits of channel capacity. The model can say "STRONG_ALIGNMENT" or "NO_OVERLAP" — it cannot say "Alice's floor is £88K."

### 4. But Can You Trust the Relay? (~400 words)

Honest limitation: in the software lane, the relay sees plaintext. `SELF_ASSERTED` means the relay says it followed the rules.

Then: the TEE lane. Same protocol, AMD SEV-SNP confidential VM. Relay operator can't see inputs. Receipt is hardware-attested. Validated on GCP N2D.

Assurance tier diagram here — not as marketing, as an honest statement of what each level proves.

### 5. The Red Team Results (~300 words)

Actual numbers from `docs/red-team-report-2026-02-25.md`:
- Schema v1 (free-text): models leaked exact investment ranges
- Schema v2 (all-enum): zero leaks across 7 scenarios, two models, multiple runs

Empirical proof that bounding the channel works — not because the model was told to be careful, but because the schema couldn't carry the information.

### 6. Try It (~200 words)

Point at the demo: clone, add API key, `docker compose up --build`, open localhost:3200, pick Salary Negotiation, toggle canary checking on.

Invite reader to run the adversarial extraction scenario (#13) and try to make the protocol leak.

Closing line: "If your agents are talking to other agents, the question isn't whether they'll disclose something they shouldn't. It's whether the channel they're talking over makes that physically possible."

## What this needs

- [ ] Add `/blog` route to the Astro site
- [ ] Draft the post text
- [ ] Pull schema/contract excerpts from the demo scenario config (no standalone code to build)
- [ ] Format red team numbers for citation
- [ ] Ensure the Docker demo path is solid (depends on agentvault#322, #328)
- [ ] Identify 10-15 people to send it to directly (AI safety researchers, framework maintainers, A2A contacts)

## Why this piece, not the website concept pages

The website explains the architecture — it's reference material for people who already care. This piece tells a story with a villain (the helpful model that leaks your secrets) and a plot twist (the fix isn't telling the model to stop — it's making leakage physically impossible). It's designed to make people care in the first place.

## Context

- A2A bootstrap support just shipped (agentvault#301-303) — positions AgentVault as complementary A2A infrastructure
- Demo runtime hardened in agentvault#322, #328
- Red team results already documented
- All 15 demo scenarios already built and working

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blog post: "Your AI Agent Just Leaked Your Salary Expectations" #67

Proposal

Target audience

Working title

Structure

1. The Hook — A Negotiation Gone Wrong (~400 words)

2. Why "Be Careful" Doesn't Work (~400 words)

3. The Fix: Bound the Channel, Not the Model (~500 words)

4. But Can You Trust the Relay? (~400 words)

5. The Red Team Results (~300 words)

6. Try It (~200 words)

What this needs

Why this piece, not the website concept pages

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Blog post: "Your AI Agent Just Leaked Your Salary Expectations" #67

Description

Proposal

Target audience

Working title

Structure

1. The Hook — A Negotiation Gone Wrong (~400 words)

2. Why "Be Careful" Doesn't Work (~400 words)

3. The Fix: Bound the Channel, Not the Model (~500 words)

4. But Can You Trust the Relay? (~400 words)

5. The Red Team Results (~300 words)

6. Try It (~200 words)

What this needs

Why this piece, not the website concept pages

Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions