Headroom and Ponytail: Cutting Cloud Tokens Without Dumbing Down Agents
RTK shell filtering, local-offload lanes A–D, and Ponytail lite YAGNI rules — how I keep Cursor usable across 100+ projects.
- cursor
- ollama
- tokens
- agents
The bill is the architecture problem
I orchestrate 100+ project folders from one hub repo. Every git status, find, and log dump sent to a cloud model is money and context rot. The fix isn’t “use AI less” — it’s classify first, offload locally, hand the cloud a TLDR.
Headroom + RTK — shrink shell output
RTK wraps everyday commands (git, grep, docker, tests) and filters noise — failures and summaries only. Headroom proxies agent traffic and compresses context at the shell boundary. Together they routinely cut 60–90% of tokens that never should have left the machine.
Rule of thumb: if the answer fits in ten lines, the cloud agent shouldn’t see 500.
Local-offload lanes A / B / C / D
local-offload.py classify "<task>" routes work:
| Lane | Engine | Use |
|---|---|---|
| A | scripts only | grep, recipes, no LLM |
| B | qwen 7b on 4060 | fast crunch |
| C | deepseek-r1 32b | JSON plans |
| D | LLaDA diffusion | long prose drafts |
Cloud Cursor reads CLOUD_TLDR.md + artifact path — never re-runs the gather.
Ponytail lite — stop over-engineering
~/.cursor/rules/ponytail.mdc enforces YAGNI: no abstractions nobody asked for, deletion over addition, question complex requests before building.
The orchestrator hub gets ponytail lite exemption for multi-file agent infrastructure — that’s explicit infrastructure, not scope creep.
Inkwell failed me today (honest note)
I tried generating this post with local 7b and 14b via the Inkwell prompt. Both ignored the assigned topic and wrote generic “AI content creation” essays. I wrote this note manually from real stack docs. Lesson: lane D/heavy needs tighter prompt enforcement or QUILL chronicle input — the pipeline isn’t fire-and-forget yet.
Next step
Wire generate-davidcole-notes.sh to reject responses that don’t mention the POST TOPIC slug — cheap local validation before any MDX lands on the site.