Reliability & Resilience

The SDK is designed for hostile network environments. It guarantees zero data loss through disk-backed buffering, automatic reconnection with exponential backoff, circuit breaking, and adaptive bandwidth management.

Spill Buffer

When the in-memory queue fills (e.g. during network outage), messages spill to a file-backed ring buffer on disk. On reconnect, spilled messages drain before the live queue to preserve FIFO ordering.

yaml SpillPolicy auto-generated

Field Type Default Description
max_disk_bytes u64 64 * 1024 * 1024 Maximum size of the spill file in bytes. When the file exceeds this limit, the `EvictionStrategy` is applied. Default: **67 108 864** (64 MiB).
max_age_secs u64 3600 Maximum age of spilled messages in seconds. Messages older than this are eligible for eviction regardless of disk usage. Default: **3 600** (1 hour).
eviction EvictionStrategy Strategy applied when the spill file reaches `max_disk_bytes`. Default: `OldestFirst`.
spill_path String /tmp/nerve-sdk-spill.bin File system path for the spill file. The directory must exist and be writable. The file is created lazily on the first spill. Default: `"/tmp/nerve-sdk-spill.bin"`.
YAML
# Production spill config
spill_policy:
  max_disk_bytes: 134217728   # 128 MB
  max_age_secs: 7200          # 2 hours
  eviction: oldest_first
  spill_path: /var/data/nerve-spill.bin

Circuit Breaker

After circuit_breaker_threshold (default: 5) consecutive connection failures, the circuit breaker trips to Open state. While open, the worker does not attempt to connect — all messages spill to disk. After a cooldown period, it enters HalfOpen to probe with a single connection attempt.

graph LR C["Closed<br/><small>Healthy</small>"] -->|N failures| O["Open<br/><small>Tripped, spilling</small>"] O -->|cooldown| HO["HalfOpen<br/><small>Probing</small>"] HO -->|success| C HO -->|failure| O

Reconnection Strategy

On disconnect, the worker uses exponential backoff:

  • Base delay: backoff_base_ms (default: 100 ms)
  • Max delay: backoff_max_ms (default: 30,000 ms = 30 seconds)
  • Pattern: 100 ms → 200 ms → 400 ms → 800 ms → ... → 30 s (capped)
  • On success: Backoff resets to base. Circuit breaker transitions to Closed.
  • During backoff: All messages spill to disk automatically.

Adaptive Bandwidth

The BandwidthEstimator samples RTT after each batch flush and reclassifies the link quality every 5 seconds. Batch parameters adjust automatically.

enum LinkTier auto-generated

Variant Fields Description
HighBandwidth Low latency, high throughput (average RTT < 50 ms). Uses default MTU-sized batches with no compression overhead. Typical for LAN and co-located deployments.
Constrained Moderate latency (50 -- 200 ms average RTT). Increases batch size to 4 KiB, extends flush interval to 50 ms, and enables compression. Typical for cellular (4G/5G) or cross-region links.
Degraded High latency or packet loss (average RTT > 200 ms). Maximises batch size to 8 KiB, extends flush interval to 200 ms, and forces compression. Typical for satellite links or severely degraded networks.

Heartbeat Protocol

A dedicated 5th QUIC stream sends 19-byte heartbeat frames at heartbeat_interval_ms (default: 5,000 ms). The server uses these to detect client health and backpressure.

Text
Wire format (19 bytes, magic 0xBEAF):
[magic: u16 LE][ts_ns: u64 LE][queue_depth: u32 LE][spill_depth: u32 LE][circuit_state: u8]

circuit_state: 0=Closed (healthy), 1=Open (tripped), 2=HalfOpen (probing)

Questions?

Reach out for help with integration, deployment, or custom domain codecs.