When a Mac mini M4 is a build node, the disk often fills before the CPU does. Xcode DerivedData, simulator images, cached CocoaPods or SwiftPM trees, and intermediate artifacts that never reached object storage stack up until xcodebuild fails with opaque I/O errors while the runner still looks “online.” This article is a 2026-ready playbook: define watermarks and alerts first, then choose “delete locally, upload remotely, or shorten TTL,” and finish with a six-step cleanup sequence plus hooks to queue wait SLOs and runner drain. You get two differently shaped tables and explicit numeric defaults you can paste into runbooks.
For scaling context see queue depth and wait-time SLOs; for maintenance handoffs see runner drain and handoff. If you need burst capacity during a cleanup window, review pricing and regions; for connectivity, help.
Treat disk like a first-class SLO signal, not a footnote. Teams that only graph CPU and memory routinely mis-diagnose “we need more Macs” when the real issue is a 92% full APFS volume and incremental compiles thrashing after a midnight cleanup script deleted the wrong branch’s cache. The matrix below separates renewable caches from compliance-bound binaries so security and platform engineering can agree on what may disappear overnight.
Three pain patterns: “out of disk” is never one sentence
- DerivedData and index bloat: With many branches in flight, a single repo can grow to 40–90 GB in weeks; incremental builds depend on it—blind
rm -rfforces a cold compile storm. - Artifacts that never leave the host: Pipelines leave
.ipabundles or test packages under/tmp; retries multiply disk usage silently. - Monitoring that ignores disk: At 95% utilization the queue still accepts jobs until random failures surface; p95 wait is misread as “capacity,” not “storage backpressure.”
Disk watermark thresholds and alert actions
| Free space on root volume | Recommended action | Accept new jobs? |
|---|---|---|
| > 20% | Routine monitoring only | Yes |
| 12–20% | Trigger DerivedData LRU by last-access age | Yes, with notice |
| < 10% | Remove default labels and drain the host | No until > 15% |
Retention matrix: delete, upload, or shorten TTL
| Data class | Preferred | Fallback | Avoid |
|---|---|---|---|
| Regenerable DerivedData | LRU trim oldest 30% by last access | Branch allowlist retention | Blanket rm -rf ~/Library/Developer |
| Auditable build outputs | Upload to object storage, 14-day lifecycle | Sync to read-only NFS | Long-term pile-up on runner SSD |
| Pods / SPM caches | Shared cache server + local cap 15 GB | Shard by lockfile hash | Unbounded re-resolve every job |
Tie to SLOs: If cleanup-induced cold builds raise per-job wall clock by more than 25%, disclose “storage governance tax” separately in the weekly report so it is not mistaken for runner performance regression.
Simulators and XCTest artifacts
iOS simulators accumulate device data and screenshot caches; 20–50 GB per host is common. Pick a non-release week for a “simulator reset day”: confirm no UI test suites are running, then uninstall unused runtimes by device family. If you rely on snapshot tests, store golden images in object storage—not on the runner—so cleanup scripts do not wipe baselines and create mass false positives.
For Xcode Archives, bind retention to release tickets: archives tied to shipped apps keep 90 days; experimental archives without a ticket recycle after 14 days. Document the rule in your internal wiki and link the same variables in CI templates to cut down “who deleted my build?” tickets.
Six-step ordered cleanup checklist
- Baseline snapshot: Capture
df -handdutop 10 directories. - Remove inbound labels: Prevent new jobs from writing mid-cleanup.
- Purge stale temp trees: Match ticket or build IDs; delete dirs older than 72 h.
- Run DerivedData LRU: Keep default branch plus the 5 most recently active branches.
- Green-path validation: Run a standard pipeline including code signing.
- Re-label and observe 60 minutes: Compare p95 queue wait against pre-cleanup baseline.
Balancing automated cleanup and human gates
Fully automated midnight cleanup is convenient but dangerous on release nights. A practical split: automation only touches obviously safe paths (temp builds, download caches past TTL). DerivedData LRU should require daytime approval or a low-traffic window. Put cron jobs and orchestrator maintenance on the same calendar so “surprise rm” never stacks on top of a planned drain and empties the queue.
Track both “GB reclaimed” and “median compile wall clock in the following 24 hours.” If space improves but build time crosses your threshold, roll back the policy or widen the branch allowlist. A single KPI on disk percentage convinces leadership the problem is “fixed” while developers feel everything slowed down.
Dedicated hardware plus elastic cloud nodes
Apple Silicon M4 unified memory and fast SSD suit a “hot” build tier, but you still need policy to limit write amplification. NodeMac offers dedicated Mac mini M4 hosts with SSH and VNC in Hong Kong, Japan, Korea, Singapore, and the United States—use short-term overflow capacity while the primary pool is drained for disk work. Pay-as-you-go beats buying a whole machine for rare spikes. Describe disk policy with the same label vocabulary as capacity lending and parallel build sharding so platform and ops share one mental model.
Alerts, forecasting, and hidden APFS usage
Instead of paging at 95% full, use two thresholds: 75% opens a “cleanup backlog” ticket for the week; 85% applies a soft gate on large new jobs. Linear extrapolation from the last 14 days of build counts and average artifact size often predicts a breach 48 hours early.
APFS snapshots and local Time Machine targets are frequent hidden consumers. Disable them in the baseline image if unused; if required, give snapshots their own quota and list top contributors weekly so “System Data” does not become a black box. On dedicated NodeMac machines, ask for snapshot policy in the delivery checklist to avoid first-month surprises.
Common anti-patterns
Indiscriminate rm -rf ~/Library/Developer/Xcode/DerivedData during peak hours destroys incremental builds for everyone. Mixing CocoaPods, SwiftPM caches, and release binaries on one volume without caps yields a “false victory”: DerivedData is gone but the disk is still full. Endlessly expanding cloud disks without changing the retention matrix raises cost linearly while the failure mode repeats on schedule.
Print the matrix on the first page of your ops handbook and review real GB saved versus build duration in quarterly reviews—far more effective than repeating “please free disk.” Wire disk soft gates into your orchestrator as a precondition alongside queue SLO checks to close the loop.