OpenClaw users judge the product on chat latency, while your roadmap judges it on throughput of long workspace automations. When both share one gateway on a single Mac mini M4, the failure mode is predictable: a repo-wide index rebuild or multi-minute tool chain grabs every core, and Slack replies jump from hundreds of milliseconds to tens of seconds. In 2026, publish a concurrency matrix that names mutex slots, cancellation semantics, and separate SLO classes—then enforce them with metrics, not vibes.
Related controls: gateway auth & tool rate limits, launchd scheduled tasks alignment, readiness probes & SLO. If the same host also runs CI, read CI concurrency fairness. Pricing; help; VNC for break-glass.
Two traffic classes, two budgets
Interactive chat is latency-sensitive and usually small-payload. Long workspace jobs are throughput-sensitive and may spawn subprocess trees, large disk IO, and repeated LLM calls. Treat them as competing tenants inside one OS—even when “one team” owns both—because the kernel does not know your org chart.
- Interactive: prioritize scheduling fairness and cap queue depth visible to users.
- Long-run: prioritize back-pressure and cancellation; never infinite retry loops.
- Hybrid commands: label them explicitly so routers pick the right budget.
Concurrency matrix
| Workload | Default slot policy | User-visible risk |
|---|---|---|
| DM answers with light tool calls | Always-on reserved slot(s) | Perceived “bot is down” if p95 > ~3s |
| Nightly doc rebuild across monorepo | Bounded parallel workers + mutex on git operations | Chat starvation if mutex missing |
| Human-triggered “fix all lint” tsunami | Queue with visible position + cancel | Duplicate edits if cancel is not cooperative |
Cancellation and cooperative timeouts
A cancel button that only stops the parent coroutine while child xcodebuild processes keep running is worse than no cancel—it creates partial writes. Standardize: propagate cancellation tokens, use process groups where available, and set hard wall-clock caps per tool class with audit logs when killed.
| Tool family | Soft timeout | Hard kill |
|---|---|---|
| HTTP JSON APIs | 30s client read | 90s absolute |
| Local compile / tests | Progress events every 60s | 45 min cap unless ticketed override |
| Disk-heavy sync | IO throughput floor alarm | Operator cancel + checksum verify |
Operator note: schedule heavy jobs with calendar jitter so they do not align with daily standup message bursts—simple, effective, and boring.
Eight rollout steps
- Instrument p95 chat latency separately from job completion time.
- Define mutex around git, package managers, and simulator boot.
- Reserve slots for interactive traffic on each gateway host.
- Wire dashboards for queue depth and cancel success rate.
- Document which commands are “heavy” in your SOUL or operator README.
- Run load tests mixing chat bursts with scheduled jobs.
- Split hosts when metrics show sustained contention—add a second NodeMac Mac mini M4.
- Post-incident review must cite which budget was exceeded.
FAQ
Why is chat slow only during nightly jobs?
Shared CPU, IO, and tool concurrency caps. Reserve interactive slots and cap parallel long jobs.
One gateway process or two?
Production should isolate or use strict mutex tiers; sharing without limits creates tail latency spikes.
How does NodeMac help?
Dedicated M4 per role/region, SSH automation, optional VNC—split chat and batch across hosts.