June 24, 2026 Agentic Workflows

How I Actually Use AI — Hub, Swarm, and Local-First

One operator, 100+ projects — Cursor as conductor, Ollama on the Ryzen box, MSI for long runs, and cloud models only for decisions that need eyes.

cursor
agents
ollama
local-first
homelab

Retro-futuristic vaporwave workspace — neon command center for solo AI orchestration — My daily command surface on Nobara — Cursor up front, Ryzen hub behind it, swarm nodes on Tailscale.

I’m a solo founder in Muskogee running a 501(c)(3), a web agency, trading experiments, and a pile of desktop apps — from one workspace with 100+ project folders. AI isn’t a chatbot I ask for poems. It’s infrastructure: routing, memory, and labor I can’t afford to hire.

AI-assisted, human-verified everywhere. I architect; agents implement; I merge what I understand.

The mental model: hub and spokes

One repo — ORGANIZATION — is the hub. Client sites, Tauri apps, trading bots, and docs live in spokes (01_web_design/, 05_apps_and_extensions/, etc.). The cloud agent in Cursor is the conductor, not the orchestra.

Diagram: MacBook conductor, MSI worker, Ryzen Ollama hub, NAS vault, ORGANIZATION hub, project spokes, and local offload lanes A-D — Four-machine swarm + hub-and-spoke routing. Cloud Cursor never sees the whole tree — only TLDR handoffs.

Role	Machine	What it does
Conductor	MacBook Air	Cursor — routing, review, surgical edits
Local muscle	Ryzen desktop	Ollama, grep, doc crunch, lane A–D
Night shift	MSI laptop	Scrapers, long prose, `agy` over SSH
Vault	i7 fileserver	NAS, Nextcloud, git mirrors
Spokes	100+ folders	One project per path — scoped agent sessions

I never dump the whole tree into chat. I read a map (WORKSPACE.md, FOLDER_MAP.md), scope to one spoke, and hand the cloud agent a TLDR plus artifact paths — not raw logs.

Classify first, then pick a lane

Every non-trivial task starts with:

python3 scripts/context-stack/local-offload.py classify "<what I'm doing>"

That routes into lanes:

Lane	Engine	Typical work
A	No LLM	grep, stats, pandoc, `ctx_execute` sandbox
B	qwen 7b (4060)	Fast summaries, copy crunch
C	deepseek-r1 32b	JSON plans, reasoning
D	LLaDA diffusion	Long prose drafts
async	MSI `agy`	Blog batches, scrapers, 20+ min jobs

Workspace RAG concept — semantic search across project docs without indexing secrets — Local-first context: gather and index on the hub — cloud applies patches from handoff JSON only.

Cloud Cursor reads CLOUD_TLDR.md or .agents/LOGS/offload-handoff.json — never re-runs the gather. That alone saves roughly 85–90% of tokens vs pasting git status and find into chat. Full pipeline: Documenting 100+ Projects.

Cursor: rules, skills, and Ponytail

Cursor is my primary IDE. The hub repo carries .cursorrules and domain rules so every session knows:

Folder map discipline — check the index before grepping the universe
Local housekeeping — cloud agent applies patches; local scripts do audits
Context-mode — analyze in sandbox code, stdout only
Ponytail lite — YAGNI on leaf apps; orchestration infra is explicitly allowed on the hub

I pull patterns from a 573-skill library (SKILL_LIBRARY/) into project .agents/SKILLS/ when a spoke needs them. Recurring work becomes a SKILL.md, not a re-explained prompt. Token stack deep-dive: Headroom and Ponytail.

Multi-model review when stakes are high

For security-sensitive or cross-platform desktop work (Gnomad Slate, Webcanvas), I don’t trust one model. I run parallel reviews through NVIDIA NIM — GLM, Kimi, Nemotron, Qwen — merge P0 consensus, then implement.

Distributed multi-agent orchestration — parallel expert reviewers feeding a single implementation plan — Parallel NIM reviews → one consolidated P0 list → human-verified implementation.

Reports land in reviews/; the cloud agent gets the synthesis, not four full dumps. Greenfield apps use the project agentic loop: brief → local planner → expert panel → phased build. Cloud reads LOOP_STATE.json and PANEL_SYNTHESIS.md only.

Personas: INKWELL and QUILL

Long-form posts for davidcole.cloud use two voices:

Persona	Role
INKWELL	Technical blog MDX — `scripts/context-stack/prompts/inkwell-blog.md`
QUILL	Session chronicle — `npm run chronicle`

Inkwell is supposed to run on local 14b/heavy lanes. Honest failure today: I tried generating this post with local 7b and 14b via the Inkwell prompt. Both ignored the assigned topic and wrote generic “AI content creation” essays. I wrote this note manually from real stack docs.

Lane D/heavy is not fire-and-forget yet — it needs tighter prompt enforcement or QUILL chronicle input before I trust auto-drafts.

generate-davidcole-notes.sh now rejects responses that do not mention the POST TOPIC slug before any MDX lands on the site.

Trading VPS: agents with memory

Paper trading on a Hostinger VPS (discipline over dopamine) now has Hermes — a nightly meta-agent that reviews fills, updates per-bot knowledge bases, patches bounded .env keys, and exports SFT JSONL for weekend training on the MSI.

Algorithmic trading agent stack — VPS execution loop with local training on MSI — Train local, loop remote — VPS trades; Hermes remembers; MSI fine-tunes on exported JSONL.

Details: Hermes on the VPS. Repo: VPS_Agent.

Hardware honesty

Dual GPU Ollama on Nobara (5060 Ti + 4060) — warm 7b for fast crunch, unload before loading 32b reasoner. I stop Ollama when I’m not using it; GPUs run hot.
RTK + Headroom shrink shell output 60–90% before it hits the model.
Tailscale ties the mesh together without exposing Ollama to the internet (127.0.0.1 only).

What I don’t do

Send secrets or VAULT paths to RAG or cloud context
Let agents run repo-wide find without a map lookup first
Trust agent output without npm run build / cargo test on the spoke
Auto-publish Inkwell drafts without reading them

The pattern in one line

Directive → classify → local gather → cloud applies → human checkpoint → chronicle.

That’s how one person ships client sites, desktop alphas, and a trading stack without pretending to be a team of twelve.

What I’d do differently

I should have built the handoff contract (TLDR + JSON artifacts) before scaling to 100 projects — not after the third “why is Cursor slow” meltdown. Earlier Hermes-style memory on the trading VPS would have saved weeks of re-reading JSONL by hand.

Next step: wire QUILL chronicle into the Inkwell brief automatically so lane D drafts start from session memory, not a static file.

Start here if you’re new: Agentic Workflows for Solo Developers · Headroom and Ponytail