How I Actually Use AI — Hub, Swarm, and Local-First

One operator, 100+ projects — Cursor as conductor, Ollama on the Ryzen box, MSI for long runs, and cloud models only for decisions that need eyes.

  • cursor
  • agents
  • ollama
  • local-first
  • homelab
Retro-futuristic vaporwave workspace — neon command center for solo AI orchestration
My daily command surface on Nobara — Cursor up front, Ryzen hub behind it, swarm nodes on Tailscale.

I’m a solo founder in Muskogee running a 501(c)(3), a web agency, trading experiments, and a pile of desktop apps — from one workspace with 100+ project folders. AI isn’t a chatbot I ask for poems. It’s infrastructure: routing, memory, and labor I can’t afford to hire.

AI-assisted, human-verified everywhere. I architect; agents implement; I merge what I understand.


The mental model: hub and spokes

One repo — ORGANIZATION — is the hub. Client sites, Tauri apps, trading bots, and docs live in spokes (01_web_design/, 05_apps_and_extensions/, etc.). The cloud agent in Cursor is the conductor, not the orchestra.

Diagram: MacBook conductor, MSI worker, Ryzen Ollama hub, NAS vault, ORGANIZATION hub, project spokes, and local offload lanes A-D
Four-machine swarm + hub-and-spoke routing. Cloud Cursor never sees the whole tree — only TLDR handoffs.
RoleMachineWhat it does
ConductorMacBook AirCursor — routing, review, surgical edits
Local muscleRyzen desktopOllama, grep, doc crunch, lane A–D
Night shiftMSI laptopScrapers, long prose, agy over SSH
Vaulti7 fileserverNAS, Nextcloud, git mirrors
Spokes100+ foldersOne project per path — scoped agent sessions

I never dump the whole tree into chat. I read a map (WORKSPACE.md, FOLDER_MAP.md), scope to one spoke, and hand the cloud agent a TLDR plus artifact paths — not raw logs.


Classify first, then pick a lane

Every non-trivial task starts with:

python3 scripts/context-stack/local-offload.py classify "<what I'm doing>"

That routes into lanes:

LaneEngineTypical work
ANo LLMgrep, stats, pandoc, ctx_execute sandbox
Bqwen 7b (4060)Fast summaries, copy crunch
Cdeepseek-r1 32bJSON plans, reasoning
DLLaDA diffusionLong prose drafts
asyncMSI agyBlog batches, scrapers, 20+ min jobs
Workspace RAG concept — semantic search across project docs without indexing secrets
Local-first context: gather and index on the hub — cloud applies patches from handoff JSON only.

Cloud Cursor reads CLOUD_TLDR.md or .agents/LOGS/offload-handoff.jsonnever re-runs the gather. That alone saves roughly 85–90% of tokens vs pasting git status and find into chat. Full pipeline: Documenting 100+ Projects.


Cursor: rules, skills, and Ponytail

Cursor is my primary IDE. The hub repo carries .cursorrules and domain rules so every session knows:

  • Folder map discipline — check the index before grepping the universe
  • Local housekeeping — cloud agent applies patches; local scripts do audits
  • Context-mode — analyze in sandbox code, stdout only
  • Ponytail lite — YAGNI on leaf apps; orchestration infra is explicitly allowed on the hub

I pull patterns from a 573-skill library (SKILL_LIBRARY/) into project .agents/SKILLS/ when a spoke needs them. Recurring work becomes a SKILL.md, not a re-explained prompt. Token stack deep-dive: Headroom and Ponytail.


Multi-model review when stakes are high

For security-sensitive or cross-platform desktop work (Gnomad Slate, Webcanvas), I don’t trust one model. I run parallel reviews through NVIDIA NIM — GLM, Kimi, Nemotron, Qwen — merge P0 consensus, then implement.

Distributed multi-agent orchestration — parallel expert reviewers feeding a single implementation plan
Parallel NIM reviews → one consolidated P0 list → human-verified implementation.

Reports land in reviews/; the cloud agent gets the synthesis, not four full dumps. Greenfield apps use the project agentic loop: brief → local planner → expert panel → phased build. Cloud reads LOOP_STATE.json and PANEL_SYNTHESIS.md only.


Personas: INKWELL and QUILL

Long-form posts for davidcole.cloud use two voices:

PersonaRole
INKWELLTechnical blog MDX — scripts/context-stack/prompts/inkwell-blog.md
QUILLSession chronicle — npm run chronicle

Inkwell is supposed to run on local 14b/heavy lanes. Honest failure today: I tried generating this post with local 7b and 14b via the Inkwell prompt. Both ignored the assigned topic and wrote generic “AI content creation” essays. I wrote this note manually from real stack docs.

Lane D/heavy is not fire-and-forget yet — it needs tighter prompt enforcement or QUILL chronicle input before I trust auto-drafts.

generate-davidcole-notes.sh now rejects responses that do not mention the POST TOPIC slug before any MDX lands on the site.


Trading VPS: agents with memory

Paper trading on a Hostinger VPS (discipline over dopamine) now has Hermes — a nightly meta-agent that reviews fills, updates per-bot knowledge bases, patches bounded .env keys, and exports SFT JSONL for weekend training on the MSI.

Algorithmic trading agent stack — VPS execution loop with local training on MSI
Train local, loop remote — VPS trades; Hermes remembers; MSI fine-tunes on exported JSONL.

Details: Hermes on the VPS. Repo: VPS_Agent.


Hardware honesty

  • Dual GPU Ollama on Nobara (5060 Ti + 4060) — warm 7b for fast crunch, unload before loading 32b reasoner. I stop Ollama when I’m not using it; GPUs run hot.
  • RTK + Headroom shrink shell output 60–90% before it hits the model.
  • Tailscale ties the mesh together without exposing Ollama to the internet (127.0.0.1 only).

What I don’t do

  • Send secrets or VAULT paths to RAG or cloud context
  • Let agents run repo-wide find without a map lookup first
  • Trust agent output without npm run build / cargo test on the spoke
  • Auto-publish Inkwell drafts without reading them

The pattern in one line

Directive → classify → local gather → cloud applies → human checkpoint → chronicle.

That’s how one person ships client sites, desktop alphas, and a trading stack without pretending to be a team of twelve.


What I’d do differently

I should have built the handoff contract (TLDR + JSON artifacts) before scaling to 100 projects — not after the third “why is Cursor slow” meltdown. Earlier Hermes-style memory on the trading VPS would have saved weeks of re-reading JSONL by hand.

Next step: wire QUILL chronicle into the Inkwell brief automatically so lane D drafts start from session memory, not a static file.

Start here if you’re new: Agentic Workflows for Solo Developers · Headroom and Ponytail