Watchdog health check fails when gateway.tailscale.mode=serve (WS CLI broken)

## Problem

When `gateway.tailscale.mode` is set to `"serve"`, all OpenClaw CLI commands that use WebSocket connections to the gateway fail with:

```
gateway connect failed: Error: gateway closed (1000): 
nodes status failed: Error: gateway closed (1000 normal closure): no close reason
Gateway target: ws://127.0.0.1:18789
Source: local loopback
```

This affects:
- `openclaw health --json` (used by AlphaClaw watchdog for health checks)
- `openclaw nodes status --json` (used by `/api/nodes` route)
- `openclaw nodes pending --json`
- `openclaw devices list --json`

The HTTP `/health` endpoint works fine — only WS-based CLI commands are broken.

## Impact

1. **False health failures**: AlphaClaw's watchdog uses `openclaw health --json` for health checks. When these fail, the UI shows the gateway as "constantly restarting" even though it's running fine.

2. **Zombie process flood**: The `/api/nodes` dashboard route spawns `openclaw nodes status/pending` CLI processes that hang indefinitely (WS never completes), accumulating dozens of zombie processes that consume CPU and memory.

3. **Gateway overload**: The zombie WS connections flood the gateway with handshake timeouts, degrading performance.

## Root Cause

Tailscale Serve intercepts loopback traffic to the gateway port (18789) and appears to break the WS handshake for CLI subprocess connections. When `tailscale.mode` is set to `"off"`, all CLI commands work instantly.

## Environment

- OpenClaw: 2026.3.13
- AlphaClaw: 0.8.0  
- Node: v24.14.0
- OS: Ubuntu Noble (Linux 6.8.0-106-generic)
- Tailscale: 1.94.2

## Workaround

Set `gateway.tailscale.mode: "off"` and use SSH tunnels or `gateway.bind: "tailnet"` for remote node connections instead of Tailscale Serve.

## Suggested Fix

Consider using the HTTP `/health` endpoint for watchdog health checks instead of the WS-based CLI, since HTTP works regardless of Tailscale Serve configuration. The `/api/nodes` route could also use an HTTP-based approach or add a timeout to prevent zombie processes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Watchdog health check fails when gateway.tailscale.mode=serve (WS CLI broken) #21

Problem

Impact

Root Cause

Environment

Workaround

Suggested Fix

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Watchdog health check fails when gateway.tailscale.mode=serve (WS CLI broken) #21

Description

Problem

Impact

Root Cause

Environment

Workaround

Suggested Fix

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions