Practical reference for running rustbgpd in production. For config syntax, see CONFIGURATION.md. For security posture, see SECURITY.md.
rustbgpd /etc/rustbgpd/config.tomlOr via systemd (see examples/systemd/rustbgpd.service):
sudo systemctl start rustbgpdThe daemon validates the config file at startup. Validation errors display rustc-style diagnostics showing the offending TOML line with column markers:
error: invalid hold_time 2: must be 0 or >= 3
--> /etc/rustbgpd/config.toml:12:13
|
12 | hold_time = 2
| ^ must be 0 or >= 3
The daemon exits with code 1 — it never starts with an invalid config.
On success, structured JSON logs go to stdout. The daemon is ready when you
see the starting rustbgpd log line with version, ASN, and router ID.
Set log_level on any neighbor or peer group to override the global log level:
[[neighbors]]
address = "10.0.0.1"
remote_asn = 65001
log_level = "debug"Or filter via RUST_LOG using the per-peer tracing span:
RUST_LOG=info,peer{peer_addr=10.0.0.1}=debug rustbgpd /etc/rustbgpd/config.tomlValidate a config file without starting the daemon:
rustbgpd --check /etc/rustbgpd/config.tomlPrints rustc-style diagnostics on error, or config OK on success.
Preview what a SIGHUP reload would change before sending it:
# Compare proposed config against current config
rustbgpd --diff /tmp/new-config.toml /etc/rustbgpd/config.toml
# JSON output for scripting
rustbgpd --diff /tmp/new-config.toml /etc/rustbgpd/config.toml --jsonOutput is grouped into three sections:
- Reload-applied changes — neighbor add/remove/modify that SIGHUP will reconcile immediately.
- Restart-required changes —
[global],[rpki],[bmp],[mrt]changes that require a full daemon restart. - Informational — peer group and policy changes that are detected but not reconciled by the current SIGHUP path. Shown for visibility only.
Exit codes: 0 = no actionable changes, 1 = actionable changes found, 2 = error (bad config, missing file).
sudo systemctl reload rustbgpd
# or: kill -HUP $(pidof rustbgpd)What happens:
- The daemon re-reads the TOML config file from disk.
diff_neighbors()computes per-peer add/remove/change deltas.- New peers are added, removed peers get NOTIFICATION and teardown, changed peers are removed and re-added.
- Global section changes (
[global],[rpki],[bmp],[mrt]) are logged as warnings and require a full restart to take effect. - Peer group and policy changes are not currently reconciled — they are detected and logged but not applied. This is a known limitation tracked in the roadmap.
Reload failures are logged per-peer. If reconciliation fails, the daemon keeps the previous in-memory config and continues running.
Use rustbgpd --diff to preview changes before reloading.
| State | Where | When |
|---|---|---|
| Neighbor add/delete via gRPC | Config file (atomic write) | Immediately on mutation |
| GR restart marker | <runtime_state_dir>/gr-restart.toml |
On coordinated shutdown |
| MRT dump files | [mrt] output_dir |
On periodic timer or TriggerMrtDump |
| gRPC UDS socket | <runtime_state_dir>/grpc.sock |
Daemon lifetime |
Not persisted: routing state (Adj-RIB-In, Loc-RIB, Adj-RIB-Out), policy evaluation state, RPKI VRP tables, BMP client state. All routing state is rebuilt from peers after restart.
- Build the new version:
cargo build --release - Stop the daemon:
systemctl stop rustbgpd(orrustbgpctl shutdown) - Replace the binary at
/usr/local/bin/rustbgpd - Start:
systemctl start rustbgpd
When Graceful Restart is enabled (the default), the coordinated shutdown in
step 2 writes a GR restart marker. On step 4, the daemon advertises R=1 to
static peers, asking them to retain our routes while we reconnect. The restart
window is the largest gr_restart_time among all GR-enabled peers.
For zero-downtime upgrades in a route-server pair, drain traffic to the standby, upgrade, then swap.
The daemon treats an unexpected gRPC server exit as fatal and initiates a coordinated shutdown (NOTIFICATION to all peers, GR marker write). This is deliberate: losing the control plane means losing the ability to shut down cleanly later. See ADR-0022.
Each RTR client reconnects independently after a fixed retry_interval
(default 600s). If no fresh EndOfData arrives before
expire_interval (default 7200s), cached VRPs for that server are discarded.
Routes are re-validated against the remaining VRP table.
When all caches are down, the VRP table is empty and all routes have
validation state NotFound. If your policy denies NotFound routes, this
will cause route drops. The recommended policy is to deny Invalid and
prefer Valid, leaving NotFound as a neutral fallback.
Each BMP client reconnects independently with backoff (default
reconnect_interval = 30s). During disconnection, BMP events for that
collector are dropped. No routing state is affected — BMP is purely
observational. On reconnect, the client sends a fresh Initiation message;
the collector rebuilds state from subsequent Peer Up and Route Monitoring
messages.
If the output directory is not writable, the MRT manager logs an error and skips that dump cycle. Periodic dumps continue on the next interval. The daemon does not crash on MRT failures.
When a peer sends more prefixes than max_prefixes, the daemon sends a
NOTIFICATION (Cease / Maximum Number of Prefixes Reached) and tears down the
session. The peer is not automatically re-enabled — use
rustbgpctl neighbor <addr> enable or the gRPC EnableNeighbor RPC to
restart it.
Metrics are exposed on the Prometheus endpoint if prometheus_addr is
configured. If omitted, metrics are still collected internally and available
via gRPC GetMetrics and GetHealth RPCs.
| Metric | What it tells you |
|---|---|
bgp_peers_established |
Number of peers in Established state |
bgp_peers_configured |
Total configured peers |
bgp_uptime_seconds |
Daemon uptime |
| Metric | What it tells you |
|---|---|
bgp_rib_prefixes{table="loc_rib"} |
Loc-RIB size (best paths) |
bgp_rib_prefixes{table="adj_rib_in"} |
Total received prefixes |
bgp_rib_prefixes{table="adj_rib_out"} |
Total advertised prefixes |
bgp_updates_received_total |
Inbound UPDATE count |
bgp_updates_sent_total |
Outbound UPDATE count |
| Metric | What it tells you |
|---|---|
bgp_gr_active_peers |
Peers currently in GR stale-route state |
bgp_gr_stale_routes |
Routes currently marked stale |
bgp_gr_timer_expired_total |
GR timers that expired (routes swept) |
| Metric | What it tells you |
|---|---|
bgp_rpki_vrp_count{af="ipv4"} |
IPv4 VRP entries loaded |
bgp_rpki_vrp_count{af="ipv6"} |
IPv6 VRP entries loaded |
A sudden drop in VRP count likely means a cache connection was lost or the cache itself has stale data.
rustbgpd uses structured JSON logging. Key messages to watch for:
| Message | Level | Meaning |
|---|---|---|
starting rustbgpd |
INFO | Daemon started successfully |
peer session established |
INFO | BGP session reached Established |
peer session down |
INFO | BGP session left Established |
received shutdown signal |
INFO | SIGTERM/SIGINT received |
shutdown initiated via gRPC |
INFO | Shutdown RPC called |
gRPC server exited unexpectedly |
ERROR | Fatal — coordinated shutdown follows |
config reloaded |
INFO | SIGHUP reload succeeded |
config reload failed |
ERROR | SIGHUP reload failed — previous config kept |
GR restart marker |
INFO | Restart marker written or read |
max-prefix limit exceeded |
WARN | Peer exceeded prefix limit |
gRPC TCP listener bound to a non-loopback address |
WARN | Security posture warning |
-
Check peer state:
rustbgpctl neighbor
Look at the FSM state.
Activemeans we're trying to connect but TCP isn't establishing.OpenSent/OpenConfirmmeans OPEN exchange is failing. -
Check logs for the peer:
journalctl -u rustbgpd | grep "10.0.0.2"
Look for NOTIFICATION codes, capability mismatches, or hold timer expiry.
-
Common causes:
- TCP not reaching: Firewall, wrong address, peer not listening on 179
- ASN mismatch: Remote peer has a different
remote-asconfigured for us - Router ID collision: Two speakers with the same router ID
- Hold timer zero vs non-zero: One side sends hold_time=0, the other expects keepalives
- Capability mismatch: Check address family negotiation in OPEN logs
- MD5 mismatch: TCP RST with no BGP-level error; check both sides' passwords
- TTL security: GTSM requires TTL=255; multi-hop peers will fail
-
Verify from the remote side: Check FRR/BIRD/peer logs for their view of the session attempt.
rustbgpctl neighbor 10.0.0.5 add --asn 65005 --description "new-peer"The peer is persisted to the config file automatically.
rustbgpctl neighbor 10.0.0.5 deleteSends NOTIFICATION, tears down the session, removes from config.
rustbgpctl neighbor 10.0.0.2 softresetRe-applies import policy to all routes from this peer without tearing down the session.
rustbgpctl neighbor 10.0.0.2 enable
rustbgpctl neighbor 10.0.0.2 disable --reason "maintenance"rustbgpctl mrt-dumprustbgpctl top # default 2s poll
rustbgpctl top -i 5 # 5s poll intervalShows sessions, prefix counts, message rates, RPKI VRP counts, and
streaming route events in a terminal UI. Press h for keybindings.
rustbgpctl healthrustbgpctl rib received 10.0.0.2rustbgpctl ribrustbgpctl shutdownSends NOTIFICATION to all peers, writes GR marker, exits cleanly.
rustbgpctl rib --prefix 10.0.0.0/24 --explainShows all candidates for a prefix with the decisive comparison reason
for each non-winner (e.g., higher_local_pref, shorter_as_path).
Optional HTTP server for external looking glass frontends (Alice-LG, etc.). Configure in TOML:
[global.telemetry.looking_glass]
addr = "0.0.0.0:8080"Endpoints: /status, /protocols/bgp, /routes/protocol/{id},
/routes/peer/{peer}. Omit the section entirely to disable.