From Tamil பின்மொழி (pin mozhi) -- "afterword"
An AI-powered CLI for testing SIP and WebRTC voice endpoints. Describe what you want to test in plain English, and Pinmoli handles the protocol details -- INVITE flows, codec negotiation, RTP streaming, failure analysis.
Think "Postman for Voice", but conversational.
$ pinmoli
Pinmoli - SIP/WebRTC Testing Agent
You: Test sip:+15551234567@trunk.example.com with INVITE, wait 15 seconds for a response
Pinmoli: Running INVITE test against sip:+15551234567@trunk.example.com...
[sip_test] INVITE sip:+15551234567@trunk.example.com
├─ 100 Trying (12ms)
├─ 180 Ringing (45ms)
├─ 200 OK (1203ms) — codec: PCMU/8000
├─ ACK sent
├─ RTP: sent 150 packets (voice-hello, 3.0s)
├─ RTP: waiting 15s for agent response...
├─ RTP: received 1247 packets (15.0s)
└─ BYE sent, 200 OK
Call completed successfully. The agent answered after 1.2s and spoke for
the full 15-second window. Codec negotiated: PCMU/8000 (G.711 u-law).
- Natural language interface -- describe tests in plain English
- Full SIP call flows -- OPTIONS, INVITE with SDP, REGISTER with auth, ACK, BYE
- WebRTC via WHIP -- connect to any WHIP endpoint (LiveKit, Cloudflare, Janus)
- Bidirectional RTP audio -- send speech, receive and measure agent responses
- DTMF send and receive (RFC 4733) -- navigate IVR menus, detect incoming DTMF
- Runtime speech synthesis -- espeak (offline) or Gemini TTS (high quality, Vertex AI)
- Real codec negotiation -- PCMU, PCMA, G722, opus with automatic transcoding
- Failure analysis -- pattern-matched diagnostics with actionable recovery steps
- Test persistence -- save, load, list test configs (SQLite + FTS5)
- Per-session output -- each run creates a directory with signaling logs, metadata, flow.json, audio WAVs
- Interactive multi-turn calls -- start_call → send_audio → receive_audio → end_call, with live TUI indicators
- Snapshot replay -- replay saved interactive calls from WAV files against the live endpoint, compare results
- Session replay -- re-execute recorded sessions without LLM, compare flows
- Automatic packet capture -- SIP + RTP traffic to pcap (Wireshark-ready)
- Pipe mode -- stdin/stdout for scripting and CI
- STUN NAT discovery -- public IP:port for SDP, works in WSL2/Docker
- Runs in Docker -- ffmpeg, espeak, tcpdump, tini included
- Docker
- An LLM provider credential (see below)
Vertex AI gives you Gemini as the LLM provider plus Gemini TTS for high-quality speech generation. The docker-compose.yml is pre-configured -- just drop in a service account key.
1. Create a service account:
# In Google Cloud Console or via gcloud:
gcloud iam service-accounts create pinmoli \
--display-name="Pinmoli SIP Tester"
gcloud projects add-iam-policy-binding YOUR_PROJECT_ID \
--member="serviceAccount:pinmoli@YOUR_PROJECT_ID.iam.gserviceaccount.com" \
--role="roles/aiplatform.user"
gcloud iam service-accounts keys create secrets/gcp-service-account.json \
--iam-account=pinmoli@YOUR_PROJECT_ID.iam.gserviceaccount.com2. Place the JSON key:
mkdir -p secrets
# Move your downloaded key to:
# secrets/gcp-service-account.json3. Set your project (if not lifeandhalf-24122025):
Create or edit .env:
GOOGLE_CLOUD_PROJECT=your-project-id
# GOOGLE_CLOUD_LOCATION=us-central1 # default, change if needed4. Start Pinmoli:
docker compose build
docker compose up -d
docker compose exec pinmoli npx tsx src/cli.ts --service-account /app/secrets/gcp-service-account.jsonThe docker-compose.yml maps secrets/ into the container at /app/secrets/ (via the .:/app bind mount) and sets GOOGLE_APPLICATION_CREDENTIALS, GOOGLE_CLOUD_PROJECT, and GOOGLE_CLOUD_LOCATION automatically.
Set one environment variable and Pinmoli auto-detects the provider:
Anthropic:
echo "ANTHROPIC_API_KEY=sk-ant-..." >> .env
docker compose exec pinmoli npx tsx src/cli.tsOpenAI:
echo "OPENAI_API_KEY=sk-..." >> .env
docker compose exec pinmoli npx tsx src/cli.tsGoogle Gemini (API key):
echo "GEMINI_API_KEY=..." >> .env
docker compose exec pinmoli npx tsx src/cli.tsNote: The Gemini API key path does not support TTS. Use Vertex AI (service account) for Gemini TTS.
Groq:
echo "GROQ_API_KEY=gsk_..." >> .env
docker compose exec pinmoli npx tsx src/cli.tsOpenRouter:
echo "OPENROUTER_API_KEY=..." >> .env
docker compose exec pinmoli npx tsx src/cli.tsdocker pull ghcr.io/arakoodev/pinmoli:latestRun with any provider:
# Anthropic
docker run --rm -it --network host \
-v $(pwd)/captures:/app/captures \
-e ANTHROPIC_API_KEY=sk-ant-... \
ghcr.io/arakoodev/pinmoli
# OpenAI
docker run --rm -it --network host \
-v $(pwd)/captures:/app/captures \
-e OPENAI_API_KEY=sk-... \
ghcr.io/arakoodev/pinmoli
# Google Gemini (API key)
docker run --rm -it --network host \
-v $(pwd)/captures:/app/captures \
-e GEMINI_API_KEY=... \
ghcr.io/arakoodev/pinmoli
# Google Vertex AI (service account)
docker run --rm -it --network host \
-v $(pwd)/captures:/app/captures \
-v /path/to/key.json:/credentials.json:ro \
ghcr.io/arakoodev/pinmoli --service-account /credentials.json
# Groq
docker run --rm -it --network host \
-v $(pwd)/captures:/app/captures \
-e GROQ_API_KEY=gsk_... \
ghcr.io/arakoodev/pinmoliThe -v $(pwd)/captures:/app/captures mount persists packet captures and session output to your local machine. The image is published automatically on every push to main via GitHub Actions.
docker compose exec pinmoli npx tsx src/cli.tsType test requests in natural language. Slash commands:
/model anthropic claude-sonnet-4-5-- switch LLM provider/model at runtime/model-- show current provider and model/service-account /path/to/key.json-- configure Vertex AI credentials- Ctrl+C -- abort current operation / clear input / quit
For scripting, CI, or piping from another process:
# Single message
echo "test sip:+1234567890@host with OPTIONS" | \
docker compose exec -T pinmoli npx tsx src/cli-pipe.ts
# Multi-turn conversation
docker compose exec -T pinmoli npx tsx src/cli-pipe.ts <<'EOF'
test sip:+1234567890@trunk.example.com with OPTIONS
now try INVITE with PCMU, sendDelay 8, responseWaitTime 20
analyze the failure
EOFAgent responses go to stdout, tool output and status go to stderr.
Re-execute a recorded session without the LLM. Compares the replay flow against the original:
docker compose exec pinmoli npx tsx src/cli-replay.ts captures/<session-id>The session directory must contain a manifest.json (auto-created by Pinmoli). Each tool call is replayed with the same parameters. Original and replay flow.json files are compared side-by-side, showing sequence matches, timing deltas, and codec/RTP differences.
# Example
docker compose exec pinmoli npx tsx src/cli-replay.ts captures/20260320-065054-tw1xUse the SIP engine directly as a library:
docker compose exec pinmoli npx tsx -e "
import { runSipTest } from './src/sip/engine.js';
for await (const event of runSipTest({
uri: 'sip:trunk.example.com',
method: 'OPTIONS',
codecs: ['PCMU']
})) { console.log(JSON.stringify(event)); }
"Beyond one-shot tests (sip_test), Pinmoli supports interactive multi-turn SIP conversations — call a voice agent, listen to its greeting, speak back, listen to its response, repeat. The call stays open across tool calls, and the AI agent drives the conversation.
| Tool | Purpose |
|---|---|
start_call |
INVITE → 200 OK → ACK. Returns a callId for subsequent tools |
send_audio |
Send a WAV file (auto-wired from generate_audio) or DTMF digits |
receive_audio |
Listen for agent audio for N seconds (max 60), save as WAV |
end_call |
BYE → close sockets → cleanup |
Tell the TUI what you want in plain English:
You: Call sip:+18144693283@5789pyhutlx.sip.livekit.cloud and have
a two-turn conversation. Say hello twice, listen between each.
The agent orchestrates the call through the four tools. Each tool appears as a collapsible section in the TUI with real-time event streaming:
▼ ⠙ start_call (12 events) ← bright yellow, animated spinner
[INFO] +0.045s Starting SIP INVITE to sip:+18144693283@...
[INFO] +0.058s Public IP: 34.56.78.90, RTP mapped to :54321 (STUN)
[SIP] +0.321s Sending INVITE request...
[SIP] +0.493s Received 100 Processing
[SIP] +2.100s Received 180 Ringing
[SIP] +5.850s Received 200 OK
[INFO] +5.851s Codec negotiated: PCMU (PT=0, clock=8000Hz)
[SIP] +5.852s Sending ACK
[INFO] +5.853s Call established — callId: abc123, codec: PCMU
▶ ✓ start_call (12 events) ← auto-collapses, checkmark
▼ ⠧ send_audio (2 events) ← turn 1: send greeting
[INFO] +0.015s Sending audio as PCMU to 34.56.78.90:10000
[INFO] +3.200s Sent 260 RTP packets — saved: sent-audio-1.wav
▶ ✓ send_audio (2 events)
▼ ⠸ receive_audio (2 events) ← turn 1: agent responds
[INFO] +0.008s Listening for audio on port 54321 (15s)...
[INFO] +12.340s Received 735 RTP packets — saved: agent-response-2.wav
▶ ✓ receive_audio (2 events)
▼ ⠴ send_audio (2 events) ← turn 2: send again
[INFO] +0.012s Sending audio as PCMU to 34.56.78.90:10000
[INFO] +3.180s Sent 258 RTP packets — saved: sent-audio-3.wav
▶ ✓ send_audio (2 events)
▼ ⠦ receive_audio (2 events) ← turn 2: agent responds
[INFO] +0.009s Listening for audio on port 54321 (15s)...
[INFO] +14.100s Received 740 RTP packets — saved: agent-response-5.wav
▶ ✓ receive_audio (2 events)
▼ ⠙ end_call (2 events) ← hang up
[SIP] +0.100s Sending BYE
[SIP] +0.200s Call terminated
▶ ✓ end_call (2 events)
The TUI's ToolOutputSection component shows call state visually:
| Indicator | Meaning |
|---|---|
▼ ⠧ start_call (5 events) |
Running — bright yellow header, animated braille spinner cycling at 80ms (⠋⠙⠹⠸⠼⠴⠦⠧⠇⠏), section expanded, events streaming live |
▶ ✓ receive_audio (2 events) |
Succeeded — dimmed header, checkmark, section auto-collapsed |
▶ ✗ start_call (8 events) |
Failed — dimmed header, X mark, section auto-collapsed |
The spinner is the primary active-call indicator. During receive_audio, it keeps spinning for the entire listen duration (up to 60 seconds) — you can see at a glance that a call is alive and recording.
Ctrl+O toggles expansion of the last completed tool section to re-inspect events.
Between tool calls, the call is still active even though no section is spinning. The agent tracks the callId in its conversation context and knows the call is open until end_call. If you ask "is the call still active?" the agent can answer based on whether it has called end_call yet.
On exit (Ctrl+C), terminateAll() sends BYE on every active call before the process exits — no orphaned calls.
Some agents speak first. Listen before sending:
You: Call the agent, wait 8 seconds for its greeting, then respond.
The agent calls receive_audio immediately after start_call (before any send_audio), capturing the agent's opening message.
Each interactive call writes to its own directory under captures/:
captures/20260323-140530-x7k2/
sip-invite-5789pyhutlx.sip.livekit.cloud-20260323-140530/
sip-log.txt ← raw SIP messages (>>>SENT, <<<RECEIVED)
metadata.json ← duration, codec, success, turn count
sent-audio-1.wav ← what you sent (turn 1)
agent-response-2.wav ← what the agent said (turn 1)
sent-audio-3.wav ← what you sent (turn 2)
agent-response-4.wav ← what the agent said (turn 2)
scenario-manifest.json ← turn structure for snapshot replay
flow.json ← structured event timeline
File numbering follows the turn counter: sends get odd numbers (1, 3, 5...), receives get even numbers (2, 4, 6...).
After running interactive scenarios, replay the exact same conversation from saved WAV files — no TTS, no LLM:
# Run scenarios to generate snapshots
docker compose exec pinmoli npx tsx test/scenarios/run-scenarios.ts
# Replay all scenarios from a session
docker compose exec pinmoli npx tsx src/cli-replay-snapshot.ts captures/20260323-140530-x7k2
# Replay one specific scenario
docker compose exec pinmoli npx tsx src/cli-replay-snapshot.ts \
captures/20260323-140530-x7k2 --scenario 2-multi-turnThe replay engine reads scenario-manifest.json, opens a real SIP call to the same URI, sends the same audio files in order, listens for the same durations, and compares:
- Signaling sequence — INVITE/100/180/200/ACK/BYE must match
- Per-turn audio match — both original and replay got audio, or both got silence
- Packet count tolerance — within 30% (agent speech varies between runs)
--- 2-multi-turn ---
Turn 1: 710 pkts (original: 735) — MATCH
Turn 2: 698 pkts (original: 740) — MATCH
PASS | 37.2s (original: 37.0s)
Sequence: MATCH
Codec: PCMU (match)
The scenario runner exercises multi-turn calls without an LLM:
docker compose exec pinmoli npx tsx test/scenarios/run-scenarios.ts| Scenario | Turns | Description |
|---|---|---|
1-new-customer |
1 | Send greeting, listen for response |
2-multi-turn |
2 | Greeting + follow-up |
3-listen-first |
1 | Listen 8s for agent greeting before speaking |
4-silence-test |
1+extra | Send once, then listen without speaking (timeout behavior) |
5-rapid-exchange |
3 | Three quick turns — RTP continuity stress test |
Each scenario writes scenario-manifest.json + flow.json for later snapshot replay.
Every example below has a corresponding integration test in test/integration/readme-prompts.test.ts.
SIP basics:
Send OPTIONS to sip:trunk.example.com
INVITE sip:+15551234567@sip.livekit.cloud with opus and PCMU
Register at sip:pbx.example.com with username admin password secret
Codec negotiation:
Test with PCMA codec -- I want to verify A-law support
Call the agent using G722 and wait 20 seconds for a response
Test sip:pbx.example.com offering only PCMA and PCMU, see which it picks
DTMF and IVR navigation:
Call sip:+15551234567@trunk.example.com and press 1-2-3-# after the greeting
Call sip:+18005551234@trunk.example.com, press 1 for sales, then 0 for operator
Connect via WebRTC to https://agent.example.com/whip and enter PIN 1234#
Speech generation:
Generate speech saying "What is the weather today?" then call the agent
Generate a 1000Hz sine wave for 5 seconds, then test the endpoint
Make the greeting say "Por favor espere" in Spanish, then test
Generate speech with gemini saying "Hello, I need help with my account"
Bidirectional conversations:
Call sip:agent@example.com, listen for 5 seconds first, then send my greeting
INVITE sip:agent@livekit.cloud, send the greeting, wait 30 seconds for a response
WebRTC:
Test the WHIP endpoint at https://my-agent.example.com/whip with bearer token abc123
Save, load, and batch:
Save this test as "production-health-check"
Show me all saved tests, then run one
Compare sip:trunk-us.example.com and sip:trunk-eu.example.com
Test these servers: sip:a.example.com, sip:b.example.com, sip:c.example.com
Failure analysis:
Why did it fail?
What went wrong? (after a 488 codec mismatch)
Advanced combos:
Generate speech "Hello, I need billing support", call with PCMA, then press 2 for billing
Test sip:agent@broken-trunk.com, analyze the failure, fix it with TCP, save the config
| Provider | --provider |
Env var | Default model |
|---|---|---|---|
| Anthropic | anthropic |
ANTHROPIC_API_KEY |
claude-sonnet-4-5 |
| OpenAI | openai |
OPENAI_API_KEY |
gpt-4o |
| Google Gemini | google |
GEMINI_API_KEY |
gemini-2.5-flash |
| Google Vertex AI | google-vertex |
--service-account <path> |
gemini-2.5-pro |
| Groq | groq |
GROQ_API_KEY |
llama-3.3-70b-versatile |
| OpenRouter | openrouter |
OPENROUTER_API_KEY |
anthropic/claude-sonnet-4.5 |
The provider is auto-detected from whichever env var you set. Use --provider to override:
pinmoli --provider openai --model gpt-4o
pinmoli --provider anthropic --model claude-haiku-4-5pinmoli [options]
--provider <name> LLM provider (anthropic, openai, google, google-vertex, groq, openrouter)
--model <id> Model ID (default depends on provider)
--tts-model <id> Gemini TTS model (default: gemini-2.5-flash-tts, Vertex AI only)
--service-account <path> GCP service account JSON (implies google-vertex)
--help Show usage
| Variable | Purpose |
|---|---|
ANTHROPIC_API_KEY |
Anthropic provider |
OPENAI_API_KEY |
OpenAI provider |
GEMINI_API_KEY |
Google Gemini provider |
GROQ_API_KEY |
Groq provider |
OPENROUTER_API_KEY |
OpenRouter provider |
GOOGLE_APPLICATION_CREDENTIALS |
Path to GCP service account JSON (Vertex AI) |
GOOGLE_CLOUD_PROJECT |
GCP project ID (default: lifeandhalf-24122025) |
GOOGLE_CLOUD_LOCATION |
Vertex AI region (default: us-central1) |
LIVEKIT_ENDPOINT |
LiveKit SIP endpoint for live tests |
PINMOLI_NO_CAPTURE |
Set to 1 to disable packet capture |
The default docker-compose.yml uses network_mode: host so SIP and RTP traffic reaches the network directly. The .env file at the repo root is loaded automatically. Source directory is bind-mounted, so code changes are reflected immediately.
Each Pinmoli session creates a directory under captures/ grouping all artifacts:
captures/{session-id}/
├── manifest.json # Tool calls with params, timing, success/failure
├── audio-samples/ # Generated TTS audio (espeak, Gemini)
├── sip-invite-host-20260320-181341/
│ ├── sip-log.txt # Every SIP message sent/received with ISO timestamps
│ ├── metadata.json # Config, duration, responses, codec, public IP
│ ├── flow.json # Structured signaling flow (for replay comparison)
│ ├── agent-greeting.wav # Agent's greeting (if sendDelay > 0)
│ ├── sent-audio.wav # Outbound audio (transcoded to negotiated codec)
│ └── agent-response.wav # Agent's response audio
├── sip-options-host-20260320-180000/
│ ├── sip-log.txt
│ ├── metadata.json
│ └── flow.json
└── webrtc-whip-host-20260320-182000/
├── signaling-log.txt # WHIP offer/answer exchange
├── metadata.json
├── flow.json
└── *.wav # Audio files (opus decoded via OGG + ffmpeg)
The manifest.json records every tool call the LLM made during the session, enabling replay mode to re-execute without the LLM.
Background tcpdump captures SIP (port 5060) + RTP (UDP 10000-65535) for every session. Saves to captures/pinmoli-YYYYMMDD-HHMMSS.pcap.
Docker Compose: Captures appear at ./captures/ automatically (bind mount).
Docker Run: Mount a volume:
docker run --rm -it --network host \
-v $(pwd)/captures:/app/captures \
-e ANTHROPIC_API_KEY=sk-ant-... \
ghcr.io/arakoodev/pinmoliOpen in Wireshark:
wireshark captures/pinmoli-20260305-143022.pcapdocker run --rm -it --network host \
-e PINMOLI_NO_CAPTURE=1 \
-e ANTHROPIC_API_KEY=sk-ant-... \
ghcr.io/arakoodev/pinmoli| Sample | Description | Duration |
|---|---|---|
voice-hello |
"Hello, this is a test call from Pinmoli" | ~3s |
sine-440hz |
440 Hz sine wave | 3s |
sine-1000hz |
1000 Hz sine wave | 3s |
dtmf-123 |
DTMF tones 1-2-3 | 1.5s |
silence |
Silence | 3s |
All samples are PCMU @ 8kHz mono (G.711 u-law).
By default, generate_audio uses espeak (offline, fast). With Vertex AI configured, use Gemini TTS for higher quality:
Generate speech saying "Please transfer me to billing"
Generate speech with gemini saying "Hello, I need help with my account"
Gemini TTS requires Vertex AI (service account). The
GEMINI_API_KEYpath does not support TTS.
Pinmoli exposes 11 tools to the AI agent. You describe what you want and the agent picks the right tool.
One-shot tests:
| Tool | Purpose |
|---|---|
sip_test |
Run OPTIONS, INVITE, or REGISTER against a SIP endpoint. Supports DTMF. |
webrtc_test |
Connect to a WHIP endpoint, negotiate ICE/DTLS/SRTP, send/receive audio. Supports DTMF. |
Interactive multi-turn calls:
| Tool | Purpose |
|---|---|
start_call |
INVITE → 200 OK → ACK. Returns callId for subsequent tools. |
send_audio |
Send audio (auto-wired from generate_audio) or DTMF on an active call. |
receive_audio |
Listen for agent audio on an active call (1-60s), save as WAV. |
end_call |
BYE → close sockets → cleanup. Always call when done. |
Utilities:
| Tool | Purpose |
|---|---|
generate_audio |
Create audio samples (sine, DTMF, silence, TTS via espeak or Gemini). |
analyze_failure |
Diagnose a failed test and suggest fixes. |
save_test |
Save a test configuration by name (SQLite). |
load_test |
Load a saved test configuration by name. |
list_tests |
List all saved test configurations. |
Only one process can bind the SIP port. Kill the conflicting process inside the container:
docker compose exec pinmoli sh -c 'kill $(lsof -ti:5060)'- NAT/firewall -- private IPs (WSL2
172.x, Docker172.x) are not routable. Run from a host with a public IP or usenetwork_mode: host. - No agent running -- the remote endpoint accepted the call but has no worker to generate audio.
Usually a synthetic 503 from the sip npm library when TCP drops. Common causes: agent worker not running, malformed SDP, unroutable IPs, missing ACK.
Check credentials are accessible inside the container:
# Vertex AI
docker compose exec pinmoli ls -la /app/secrets/gcp-service-account.json
# API key providers — verify .env is loaded
docker compose exec pinmoli env | grep API_KEYgit clone https://github.com/your-fork/pinmoli.git
cd pinmoli
docker compose build
docker compose up -d
# Run tests (must pass before submitting a PR)
docker compose exec pinmoli npx vitest run
docker compose exec pinmoli npx tsc --noEmit
docker compose exec pinmoli npm run lintAll commands run inside Docker. See ARCHITECTURE.md for codebase internals, engine design, and project structure.
MIT