feat: Security, WhatsApp gateway, Hands, API hardening (23 commits) by devatsecure · Pull Request #251 · RightNow-AI/openfang

devatsecure · 2026-03-03T06:25:42Z

feat(upstream): Local enhancements — security, WhatsApp gateway, hands, API hardening

Summary

22 feature/fix commits developed locally on top of upstream main, rebased onto current RightNow-AI/openfang main (v0.3.4). All conflicts resolved; build, tests, and clippy pass.
Focus: security hardening, WhatsApp gateway reliability, Hands/Twitter/email, shared HTTP client, CredentialVault, and dashboard/API fixes to improve mergeability and production readiness.

Changes

Added

CredentialVault (S2): Encrypted secret storage (AES-256-GCM, OS keyring). Wired into kernel and init; replaces unsafe budget ptr with budget_overrides RwLock.
Shared HTTP client: Single reqwest::Client for kernel (drivers, MCP, ClawHub, channels). Per-agent rate limiting and auth whitelist tightening.
arxiv-researcher bundled skill and daily tweet cron job.
Workflow persistence and DELETE /api/workflows/{id} endpoint.
OAuth 1.0a credentials for Twitter Hand (tweet posting).
Hand register_restored: Reconciles in-memory hand registry with agents restored from SQLite.
Email adapter read_only mode to prevent unwanted auto-replies.
WhatsApp gateway: Sender context, number allowlist, timeout increase; Baileys 6.x getMessage handler; self-healing and kernel health monitor; allowlist, memory leak, and socket cleanup fixes; Node 25 compatibility and clippy fixes.

Changed

API security: Error sanitization (no internal details in ClawHub/install responses), CORS/HSTS/SSRF hardening; security headers and CORS restricted to known origins.
Agent loop: Strip <thinking> tags from LLM responses; persist agent config to SQLite on update (with model/provider update and save_agent + warn on failure).
Provider test: Use default_model_for_provider for test request (aligned with upstream).
Dashboard: Total Cost fix ($0.00), skills page empty list fix, hands active state display.
Approval policy shorthands applied at boot.

Fixed

Critical unsafe blocks and security vulnerabilities (budget overrides, vault).
WhatsApp gateway reconnect instability, auto-reconnect on non-logout disconnect.
Hand registry reconciliation and sandbox env var passthrough.
Provider test and wizard agent creation bugs.
Clippy warnings and WhatsApp gateway Node 25 compatibility.
Post-rebase: pass HTTP client to ClawHubClient in clawhub_skill_code.

Test Results

cargo build --workspace --lib: ok
cargo test --workspace: passed
cargo clippy --workspace --all-targets -- -D warnings: 0 warnings

Test Plan

Run full test suite on maintainer side.
Smoke-test WhatsApp gateway with Node 20/22/25.
Verify dashboard: skills list, Total Cost, hands status.
Verify API: workflow CRUD, provider test, agent update persistence.
Optional: CredentialVault encrypt/decrypt and budget overrides API.

- Collapse nested else-if blocks in CLI doctor command to satisfy clippy - Add "type": "commonjs" to whatsapp-gateway package.json (Node 25 defaults to ESM) - Replace import.meta.url with __dirname in gateway index.js (import.meta is invalid in CJS) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Fix test_provider sending empty model string to Anthropic API (use cheapest model from catalog, preferring haiku) - Fix wizard TOML generation: remove invalid [agent] section wrapper, use correct field name 'model' instead of 'name' under [model], move system_prompt into [model] section instead of non-existent [prompt] - Skip invalid profile values (balanced/precise/creative) that don't match ToolProfile enum variants - Return detailed error messages from spawn_agent endpoint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Show green "Active" button on hands page when a hand is already running - Load active instances on page init so Available tab reflects current state - WhatsApp gateway: resolve agent name to UUID before forwarding messages - Filter out group messages (only process direct chats @s.whatsapp.net) - Skip protocol/reaction messages that have no useful text content - Prevent echo loops by filtering fromMe messages Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Baileys 6.x requires a getMessage callback to handle pre-key message retries and decrypt incoming messages from new contacts. Without this, messages fail silently with "error in handling message" after fresh QR code pairing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…crease - Prepend sender name and phone to message text so agents can identify who they're chatting with (API MessageRequest has no metadata field) - Add allowlist filter to only process messages from approved numbers - Increase API timeout from 120s to 600s for long-running agent tasks Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Previously only reconnected on restartRequired/timedOut, treating undefined status codes as non-recoverable (QR expired). This caused the gateway to stay disconnected after random drops. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…cleanup - Move hardcoded phone allowlist to config (allowed_users in config.toml) passed as WHATSAPP_ALLOWED_USERS env var; empty = allow all - Add LRU eviction to msgStore (cap 500) to prevent unbounded memory growth - Clean up old Baileys socket before re-login to prevent leaked listeners - Add 5-minute TTL to agent ID cache so deleted/recreated agents are found Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…SQLite - Add strip_thinking_tags() to agent_loop.rs that removes <thinking>...</thinking> blocks from LLM text output before sending to channels (prevents chain-of-thought reasoning from leaking to WhatsApp users) - Applied in all 4 response paths: non-streaming/streaming EndTurn and MaxTokens - Persist agent manifest to SQLite after PATCH /api/agents/{id}/config so system prompt and other config changes survive daemon restarts - Added 5 unit tests for thinking tag stripping Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Security fixes: - WebSocket auth: constant-time token comparison (prevents timing attacks) - Middleware: remove /api/agents and /api/config from auth bypass list - Upload handler: sanitize filename metadata (path traversal prevention) - WhatsApp gateway: truncate messages exceeding 4096 chars Unsafe block fixes: - kernel.rs: replace unsafe peer_registry/peer_node ptr mutation with OnceLock - routes.rs: replace unsafe budget config ptr mutation with RwLock - routes.rs: serialize env var mutations with static Mutex Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add register_restored() to HandRegistry so hands activated in previous sessions show as "Active" after daemon restart instead of "Activate" - Reconcile restored agents with hand definitions during kernel boot - Include hand requirement env vars (ApiKey/EnvVar) in shell_exec sandbox allowed list so hands like Twitter can access their API tokens Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Add TWITTER_API_KEY, TWITTER_API_SECRET, TWITTER_ACCESS_TOKEN, and TWITTER_ACCESS_SECRET as hand requirements so they pass through the shell_exec sandbox - Update system prompt to use OAuth 1.0a (requests_oauthlib) for posting instead of Bearer Token which is read-only - Clarify Bearer Token is for reading only Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The gateway loses its Baileys WebSocket after system sleep/wake but the HTTP server stays alive, silently dropping messages. This adds: - Heartbeat watchdog in the gateway (30s interval, 90s stale threshold) that detects dead sockets and triggers automatic reconnection - Kernel health monitor loop that polls gateway /health every 30s and triggers POST /health/reconnect after 2 consecutive failures - GET /api/channels/whatsapp/health endpoint for dashboard visibility - Enhanced GET /api/health/detail with whatsapp_gateway status Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Security fixes from code audit follow-up: - S4: Sanitize error responses to not leak internal details (routes.rs) - S5: Validate OPENFANG_URL against localhost allowlist to prevent SSRF (index.js) - S9: Restrict CORS to explicit HTTP methods instead of wildcard (server.rs) - S12: Document unsafe-eval CSP requirement for Alpine.js (middleware.rs) - L3: Add HSTS header for HTTPS enforcement (middleware.rs) - L5: Document HTTP keep-alive timeout TODO (server.rs) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Secrets were stored as plaintext in ~/.openfang/secrets.env. The vault (AES-256-GCM + Argon2 KDF) already existed in openfang-extensions but was never wired up. This integrates it: - Kernel: auto-init vault on startup, migrate existing secrets.env entries to encrypted vault.enc, load vault secrets into env vars - Routes: write_secret_env/remove_secret_env now use vault first with plaintext file fallback if vault unavailable - CLI dotenv: load vault secrets at startup alongside .env/secrets.env - Graceful degradation: if vault can't init/unlock, falls back to plaintext secrets.env (no breakage) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The email adapter was replying to every incoming email (LinkedIn, Reddit, newsletters, etc.) because no sender filter or reply guard was configured. This adds: - read_only config option: when true, processes incoming emails but never sends SMTP replies (for newsletter ingestion use cases) - Wired through config -> channel_bridge -> adapter constructor Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Workflows now persist to ~/.openfang/workflows.json and survive restarts. Added DELETE /api/workflows/{id} endpoint for cleanup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The /api/usage endpoint was missing cost_usd — it only returned token counts from the scheduler. Now pulls daily cost from UsageStore so the overview page displays actual spend. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The auto_approve config flag was never processed — apply_shorthands() was not called, so shell_exec always required approval even when the user set auto_approve = true. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Three root causes of repeated disconnects: 1. Conflict reconnect loop: When WhatsApp returns conflict (440), gateway retried in 3s creating competing sessions. Now uses exponential backoff (15s, 30s, 45s, max 60s) for conflicts. 2. Daemon and gateway fighting: Both the gateway's own reconnect and the daemon health loop triggered reconnects simultaneously. Gateway now reports connStatus='reconnecting' so the daemon backs off. /health/reconnect rejects calls when already reconnecting. Daemon has 90s cooldown after triggering reconnect. 3. Max restarts too low: 3 restarts with short delays meant one bad sleep/wake cycle permanently killed the gateway. Increased to 10 restarts with delays up to 60s. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…ening P1: Replace 60+ independent reqwest::Client::new() calls with SharedHttpClients on the kernel (default: 30s timeout + 20 idle connections, streaming: no timeout). All channel adapters, LLM drivers, and runtime tools now share pooled connections. S6: Add per-agent GCRA rate limiting (200 tokens/min) to prevent one agent from starving others. Applied in send_message handler alongside existing per-IP limits. S3: Remove /api/budget, /api/sessions, and /api/profiles from the unauthenticated public endpoint whitelist — these now require Bearer auth when API key is set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

New prompt-only skill teaching agents to discover, parse, and summarize arXiv papers (cs.AI, cs.CL, cs.SE, cs.LG). Registered as bundled skill RightNow-AI#61. Daily cron job + workflow created for twitter-hand to fetch papers and tweet. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The list_skills handler only loaded user-installed skills but not the 61 bundled skills, causing the dashboard skills page to show errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Made-with: Cursor

… gitignore - WhatsApp gateway auto-connects from saved session on startup - Update Cargo.lock for v0.3.4 version bumps - Add whatsapp-gateway package-lock.json - Add research docs (multilingual chatbots, WhatsApp prompt best practices) - Gitignore: .nwave/, PR_DESCRIPTION.md, patches/, desktop gen/ schemas Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Routes LLM requests through a local Anthropic Messages API-compatible proxy at localhost:3456, using the Claude Code subscription instead of paid API credits. No API key required. - Register claude-code-proxy in driver factory (reuses AnthropicDriver) - Add provider info, base URL constant, and 3 model catalog entries - Add to infer_provider_from_model prefix list Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…driver - Retry on 408 (queue timeout) in addition to 429/529 - Parse Retry-After header for server-directed backoff instead of hardcoded exponential delays - Falls back to exponential backoff when header is absent - Applied to both streaming and non-streaming code paths Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Local proxy and complex tool-use requests can take 60-120s. The 30s timeout caused premature connection drops before the LLM could respond. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

devatsecure and others added 27 commits March 3, 2026 11:15

Add workflow persistence and DELETE endpoint

2ac04d2

Workflows now persist to ~/.openfang/workflows.json and survive restarts. Added DELETE /api/workflows/{id} endpoint for cleanup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix Total Cost showing $0.00 on dashboard overview

ab7b433

The /api/usage endpoint was missing cost_usd — it only returned token counts from the scheduler. Now pulls daily cost from UsageStore so the overview page displays actual spend. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Apply approval policy shorthands at boot

717f2d0

The auto_approve config flag was never processed — apply_shorthands() was not called, so shell_exec always required approval even when the user set auto_approve = true. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix dashboard skills page showing empty list

1ebf035

The list_skills handler only loaded user-installed skills but not the 61 bundled skills, causing the dashboard skills page to show errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(api): pass HTTP client to ClawHubClient in clawhub_skill_code

993ea3e

Made-with: Cursor

Increase HTTP client timeout from 30s to 120s for LLM requests

a231e51

Local proxy and complex tool-use requests can take 60-120s. The 30s timeout caused premature connection drops before the LLM could respond. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Security, WhatsApp gateway, Hands, API hardening (23 commits)#251

feat: Security, WhatsApp gateway, Hands, API hardening (23 commits)#251
devatsecure wants to merge 27 commits intoRightNow-AI:mainfrom
devatsecure:main

devatsecure commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

devatsecure commented Mar 3, 2026

feat(upstream): Local enhancements — security, WhatsApp gateway, hands, API hardening

Summary

Changes

Added

Changed

Fixed

Test Results

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant