Skip to content

feat: Security, WhatsApp gateway, Hands, API hardening (23 commits)#251

Open
devatsecure wants to merge 27 commits intoRightNow-AI:mainfrom
devatsecure:main
Open

feat: Security, WhatsApp gateway, Hands, API hardening (23 commits)#251
devatsecure wants to merge 27 commits intoRightNow-AI:mainfrom
devatsecure:main

Conversation

@devatsecure
Copy link

feat(upstream): Local enhancements — security, WhatsApp gateway, hands, API hardening

Summary

  • 22 feature/fix commits developed locally on top of upstream main, rebased onto current RightNow-AI/openfang main (v0.3.4). All conflicts resolved; build, tests, and clippy pass.
  • Focus: security hardening, WhatsApp gateway reliability, Hands/Twitter/email, shared HTTP client, CredentialVault, and dashboard/API fixes to improve mergeability and production readiness.

Changes

Added

  • CredentialVault (S2): Encrypted secret storage (AES-256-GCM, OS keyring). Wired into kernel and init; replaces unsafe budget ptr with budget_overrides RwLock.
  • Shared HTTP client: Single reqwest::Client for kernel (drivers, MCP, ClawHub, channels). Per-agent rate limiting and auth whitelist tightening.
  • arxiv-researcher bundled skill and daily tweet cron job.
  • Workflow persistence and DELETE /api/workflows/{id} endpoint.
  • OAuth 1.0a credentials for Twitter Hand (tweet posting).
  • Hand register_restored: Reconciles in-memory hand registry with agents restored from SQLite.
  • Email adapter read_only mode to prevent unwanted auto-replies.
  • WhatsApp gateway: Sender context, number allowlist, timeout increase; Baileys 6.x getMessage handler; self-healing and kernel health monitor; allowlist, memory leak, and socket cleanup fixes; Node 25 compatibility and clippy fixes.

Changed

  • API security: Error sanitization (no internal details in ClawHub/install responses), CORS/HSTS/SSRF hardening; security headers and CORS restricted to known origins.
  • Agent loop: Strip <thinking> tags from LLM responses; persist agent config to SQLite on update (with model/provider update and save_agent + warn on failure).
  • Provider test: Use default_model_for_provider for test request (aligned with upstream).
  • Dashboard: Total Cost fix ($0.00), skills page empty list fix, hands active state display.
  • Approval policy shorthands applied at boot.

Fixed

  • Critical unsafe blocks and security vulnerabilities (budget overrides, vault).
  • WhatsApp gateway reconnect instability, auto-reconnect on non-logout disconnect.
  • Hand registry reconciliation and sandbox env var passthrough.
  • Provider test and wizard agent creation bugs.
  • Clippy warnings and WhatsApp gateway Node 25 compatibility.
  • Post-rebase: pass HTTP client to ClawHubClient in clawhub_skill_code.

Test Results

  • cargo build --workspace --lib: ok
  • cargo test --workspace: passed
  • cargo clippy --workspace --all-targets -- -D warnings: 0 warnings

Test Plan

  • Run full test suite on maintainer side.
  • Smoke-test WhatsApp gateway with Node 20/22/25.
  • Verify dashboard: skills list, Total Cost, hands status.
  • Verify API: workflow CRUD, provider test, agent update persistence.
  • Optional: CredentialVault encrypt/decrypt and budget overrides API.

devatsecure and others added 27 commits March 3, 2026 11:15
- Collapse nested else-if blocks in CLI doctor command to satisfy clippy
- Add "type": "commonjs" to whatsapp-gateway package.json (Node 25 defaults to ESM)
- Replace import.meta.url with __dirname in gateway index.js (import.meta is invalid in CJS)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Fix test_provider sending empty model string to Anthropic API (use
  cheapest model from catalog, preferring haiku)
- Fix wizard TOML generation: remove invalid [agent] section wrapper,
  use correct field name 'model' instead of 'name' under [model],
  move system_prompt into [model] section instead of non-existent [prompt]
- Skip invalid profile values (balanced/precise/creative) that don't
  match ToolProfile enum variants
- Return detailed error messages from spawn_agent endpoint

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Show green "Active" button on hands page when a hand is already running
- Load active instances on page init so Available tab reflects current state
- WhatsApp gateway: resolve agent name to UUID before forwarding messages
- Filter out group messages (only process direct chats @s.whatsapp.net)
- Skip protocol/reaction messages that have no useful text content
- Prevent echo loops by filtering fromMe messages

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Baileys 6.x requires a getMessage callback to handle pre-key message
retries and decrypt incoming messages from new contacts. Without this,
messages fail silently with "error in handling message" after fresh
QR code pairing.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…crease

- Prepend sender name and phone to message text so agents can identify
  who they're chatting with (API MessageRequest has no metadata field)
- Add allowlist filter to only process messages from approved numbers
- Increase API timeout from 120s to 600s for long-running agent tasks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Previously only reconnected on restartRequired/timedOut, treating
undefined status codes as non-recoverable (QR expired). This caused
the gateway to stay disconnected after random drops.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…cleanup

- Move hardcoded phone allowlist to config (allowed_users in config.toml)
  passed as WHATSAPP_ALLOWED_USERS env var; empty = allow all
- Add LRU eviction to msgStore (cap 500) to prevent unbounded memory growth
- Clean up old Baileys socket before re-login to prevent leaked listeners
- Add 5-minute TTL to agent ID cache so deleted/recreated agents are found

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…SQLite

- Add strip_thinking_tags() to agent_loop.rs that removes <thinking>...</thinking>
  blocks from LLM text output before sending to channels (prevents chain-of-thought
  reasoning from leaking to WhatsApp users)
- Applied in all 4 response paths: non-streaming/streaming EndTurn and MaxTokens
- Persist agent manifest to SQLite after PATCH /api/agents/{id}/config so system
  prompt and other config changes survive daemon restarts
- Added 5 unit tests for thinking tag stripping

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Security fixes:
- WebSocket auth: constant-time token comparison (prevents timing attacks)
- Middleware: remove /api/agents and /api/config from auth bypass list
- Upload handler: sanitize filename metadata (path traversal prevention)
- WhatsApp gateway: truncate messages exceeding 4096 chars

Unsafe block fixes:
- kernel.rs: replace unsafe peer_registry/peer_node ptr mutation with OnceLock
- routes.rs: replace unsafe budget config ptr mutation with RwLock
- routes.rs: serialize env var mutations with static Mutex

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add register_restored() to HandRegistry so hands activated in previous
  sessions show as "Active" after daemon restart instead of "Activate"
- Reconcile restored agents with hand definitions during kernel boot
- Include hand requirement env vars (ApiKey/EnvVar) in shell_exec sandbox
  allowed list so hands like Twitter can access their API tokens

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Add TWITTER_API_KEY, TWITTER_API_SECRET, TWITTER_ACCESS_TOKEN, and
  TWITTER_ACCESS_SECRET as hand requirements so they pass through the
  shell_exec sandbox
- Update system prompt to use OAuth 1.0a (requests_oauthlib) for posting
  instead of Bearer Token which is read-only
- Clarify Bearer Token is for reading only

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The gateway loses its Baileys WebSocket after system sleep/wake but the
HTTP server stays alive, silently dropping messages. This adds:

- Heartbeat watchdog in the gateway (30s interval, 90s stale threshold)
  that detects dead sockets and triggers automatic reconnection
- Kernel health monitor loop that polls gateway /health every 30s and
  triggers POST /health/reconnect after 2 consecutive failures
- GET /api/channels/whatsapp/health endpoint for dashboard visibility
- Enhanced GET /api/health/detail with whatsapp_gateway status

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Security fixes from code audit follow-up:
- S4: Sanitize error responses to not leak internal details (routes.rs)
- S5: Validate OPENFANG_URL against localhost allowlist to prevent SSRF (index.js)
- S9: Restrict CORS to explicit HTTP methods instead of wildcard (server.rs)
- S12: Document unsafe-eval CSP requirement for Alpine.js (middleware.rs)
- L3: Add HSTS header for HTTPS enforcement (middleware.rs)
- L5: Document HTTP keep-alive timeout TODO (server.rs)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Secrets were stored as plaintext in ~/.openfang/secrets.env. The vault
(AES-256-GCM + Argon2 KDF) already existed in openfang-extensions but
was never wired up. This integrates it:

- Kernel: auto-init vault on startup, migrate existing secrets.env
  entries to encrypted vault.enc, load vault secrets into env vars
- Routes: write_secret_env/remove_secret_env now use vault first with
  plaintext file fallback if vault unavailable
- CLI dotenv: load vault secrets at startup alongside .env/secrets.env
- Graceful degradation: if vault can't init/unlock, falls back to
  plaintext secrets.env (no breakage)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The email adapter was replying to every incoming email (LinkedIn,
Reddit, newsletters, etc.) because no sender filter or reply guard
was configured. This adds:

- read_only config option: when true, processes incoming emails but
  never sends SMTP replies (for newsletter ingestion use cases)
- Wired through config -> channel_bridge -> adapter constructor

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Workflows now persist to ~/.openfang/workflows.json and survive restarts.
Added DELETE /api/workflows/{id} endpoint for cleanup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The /api/usage endpoint was missing cost_usd — it only returned token
counts from the scheduler. Now pulls daily cost from UsageStore so the
overview page displays actual spend.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The auto_approve config flag was never processed — apply_shorthands()
was not called, so shell_exec always required approval even when the
user set auto_approve = true.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Three root causes of repeated disconnects:

1. Conflict reconnect loop: When WhatsApp returns conflict (440),
   gateway retried in 3s creating competing sessions. Now uses
   exponential backoff (15s, 30s, 45s, max 60s) for conflicts.

2. Daemon and gateway fighting: Both the gateway's own reconnect
   and the daemon health loop triggered reconnects simultaneously.
   Gateway now reports connStatus='reconnecting' so the daemon
   backs off. /health/reconnect rejects calls when already
   reconnecting. Daemon has 90s cooldown after triggering reconnect.

3. Max restarts too low: 3 restarts with short delays meant one
   bad sleep/wake cycle permanently killed the gateway. Increased
   to 10 restarts with delays up to 60s.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ening

P1: Replace 60+ independent reqwest::Client::new() calls with SharedHttpClients
on the kernel (default: 30s timeout + 20 idle connections, streaming: no timeout).
All channel adapters, LLM drivers, and runtime tools now share pooled connections.

S6: Add per-agent GCRA rate limiting (200 tokens/min) to prevent one agent from
starving others. Applied in send_message handler alongside existing per-IP limits.

S3: Remove /api/budget, /api/sessions, and /api/profiles from the unauthenticated
public endpoint whitelist — these now require Bearer auth when API key is set.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
New prompt-only skill teaching agents to discover, parse, and summarize
arXiv papers (cs.AI, cs.CL, cs.SE, cs.LG). Registered as bundled skill RightNow-AI#61.
Daily cron job + workflow created for twitter-hand to fetch papers and tweet.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The list_skills handler only loaded user-installed skills but not the
61 bundled skills, causing the dashboard skills page to show errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… gitignore

- WhatsApp gateway auto-connects from saved session on startup
- Update Cargo.lock for v0.3.4 version bumps
- Add whatsapp-gateway package-lock.json
- Add research docs (multilingual chatbots, WhatsApp prompt best practices)
- Gitignore: .nwave/, PR_DESCRIPTION.md, patches/, desktop gen/ schemas

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Routes LLM requests through a local Anthropic Messages API-compatible
proxy at localhost:3456, using the Claude Code subscription instead of
paid API credits. No API key required.

- Register claude-code-proxy in driver factory (reuses AnthropicDriver)
- Add provider info, base URL constant, and 3 model catalog entries
- Add to infer_provider_from_model prefix list

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…driver

- Retry on 408 (queue timeout) in addition to 429/529
- Parse Retry-After header for server-directed backoff instead of
  hardcoded exponential delays
- Falls back to exponential backoff when header is absent
- Applied to both streaming and non-streaming code paths

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Local proxy and complex tool-use requests can take 60-120s. The 30s
timeout caused premature connection drops before the LLM could respond.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant