In 2026, two very different incidents both surface as 429 Too Many Requests: gateway authentication brute-force protections firing on bad tokens, and upstream SaaS throttling because ten agents discovered cron at the same minute. If you merge those signals in one dashboard, you will mis-tune limits and either weaken security or starve legitimate automation. This matrix separates the classes, documents backoff rules for Retry-After headers, and lands eight steps that fit a dedicated Mac mini M4 gateway on NodeMac—SSH for automation, VNC when macOS still demands a consent surface.
Related hardening: token auth & launchd drift, multi-model failover & timeouts, readiness probes & SLO, egress proxy & TLS allowlist. Observability: log rotation & redaction; triage: doctor. Pricing; help.
Classify the 429 before you change knobs
Start every incident with three fields in your ticket: HTTP route family (admin, webhook, tool invoke), identity (workspace, bot token, IP), and provider (Slack, Anthropic, internal CRM). Authentication rate limits should trip on suspicious identity patterns; provider limits should trip on aggregate QPS or burst windows. Mixing them causes operators to raise global limits—which is exactly how brute-force windows widen.
- Auth 429: short sliding windows, exponential lockouts, exempt loopback health scrapers explicitly.
- Tool 429: honor provider
Retry-After, cap parallel tool calls per workspace, prefer queue + worker over blind retry loops. - Mixed: if both fire, fix auth first—retrying bad tokens amplifies both counters.
Response matrix
| Symptom | Likely class | First action |
|---|---|---|
| Spike only from one IP / bad bearer | Auth brute force or leaked token replay | Revoke token, inspect launchd env drift, rerun doctor |
| 429 aligned with business hours traffic | Tool or LLM quota saturation | Lower concurrent tools, shard workspaces, raise provider tier |
| 429 after deploy only | New default timeout or retry policy too aggressive | Diff config, canary one host, roll back gateway flags |
Backoff parameters that survive review
| Layer | Starting policy | Notes |
|---|---|---|
| Gateway auth failures | Sliding window: block aggressive IPs after 10 failures / 60s | Exempt documented health scraper subnets |
| Tool HTTP to SaaS | Max 3 retries with jitter, cap sleep at 60s unless Retry-After larger | Record cumulative delay metric per workspace |
| Concurrent tool calls | Default 4 per workspace on M4 Pro hosts | Lower when CPU > 85% for > 2 minutes |
Apple Silicon tip: TLS handshakes and JSON parsing are not free—burst retries can peg a single performance core and increase tail latency. Prefer queueing with visible depth metrics over unbounded fan-out.
Eight rollout steps
- Tag logs with route family at the edge proxy if one exists.
- Instrument counters separately for auth failures vs upstream 429.
- Implement Retry-After parsing in shared HTTP client for tools.
- Add synthetic chat probe that triggers a harmless tool once per five minutes.
- Document freeze switch that disables tool side effects without stopping health.
- Load test with recorded peak traffic before marketing pushes.
- Align with security on IP allow-lists for admin surfaces exposed beyond loopback.
- Scale out with an additional NodeMac Mac mini M4 gateway when queue depth trends upward for a full sprint.
FAQ
Why healthy dashboards with chat 429s?
Probes hit different routes than user-driven tool calls. Extend probes to cover tool paths lightly.
Share counters between auth and tools?
No—separate counters prevent collateral lockouts and clarify incident root cause.
Why dedicated NodeMac hardware?
Stable network, always-on CPUs for TLS bursts, regional placement next to providers and users.