Paper Trading Needed a Memory — Hermes on the VPS
Nightly meta-agent for Belfort — trade review, bounded env patches, agent KB updates, and SFT export on the Hostinger VPS.
- trading
- vps
- hermes
- systemd
- homelab
My Belfort trading stack runs on a Hostinger VPS against Alpaca paper. For two months it executed ticks, logged JSONL, and forgot everything by morning. Coaches wrote reports. Nothing fed back into prompts unless I manually edited .env in Cursor at 11 PM. That doesn’t scale — and it’s why paper P&L stalled around −4.8% while the bots looked busy.
This is the sequel to Paper Trading on a VPS: Discipline Over Dopamine — same Hostinger box, same “train local, loop remote” rule, but now the loop remembers.
Yesterday’s push to VPS_Agent adds Hermes — a nightly meta-agent that reviews trades, updates agent knowledge bases, patches bounded config keys, and exports SFT training data. AI-assisted architecture, human-verified guardrails.
Three phases in one repo pull
Phase 1 (May–Jun): Equity dual-policy — PPO + LLM ensemble on SPY, QQQ, TSLA, NVDA, AAPL, MSFT, AMD. Peak paper equity hit +$5,955 (+6%) on May 29. Then TSLA concentration, ENSEMBLE_MODE=agree blocking 65% of ticks, and OpenRouter 429s crashed entire decision loops.
Phase 2 (Jun 11+): Pivot to SPY/QQQ options only via bot_options.py. LLM picks direction, structure (calls, puts, debit spreads), expiration (0DTE to weekly). Code enforces hard stops (−20% on premium), position caps (10% per trade, 30% deployed), and 15-second decision latency budgets. PPO is off for this phase — the LLM is the reasoning layer, not a gate.
Phase 3 (Jun 24): Hermes closes the learning loop.
What Hermes actually does
Hermes (scripts/hermes.py) runs on a systemd timer — belfort-coach.timer at ~22:15 UTC weekdays. It uses DeepSeek locally (HERMES_LLM_BACKEND=deepseek_local) and operates in three tiers:
| Tier | Capability | Auto? |
|---|---|---|
| 1 | Trade review, counterfactual stats, lessons → agent KB | Yes |
| 2 | Bounded .env patches + backup + rollback | Yes (HERMES_AUTO_APPLY=true) |
| 3 | Strategy rethink, training_focus, SFT JSONL export | Yes (HERMES_TRAIN_EXPORT=true) |
First VPS run on Jun 24: momentum filter had skipped 9 ticks on NVDA, AMD, TSLA, IWM. Red day at −2.64% — Hermes blocked loosening and tightened OPTIONS_MOMENTUM_M5_PCT (0.3→0.35) and BURRY_MIN_BEAR_SCORE (3→4). Exported 11 options + 2 burry examples to logs/hermes/training/ for weekend SFT on the MSI.
We also added belfort-hermes-hourly.timer — intraday KB updates during RTH without touching env or SFT. Nightly Hermes still owns the full strategy pass.
Bullpen agents and MSI training
The pull wasn’t just Hermes. New agents joined the bullpen:
bot_options.py— SPY/QQQ options with Alpaca chain integrationbot_bear.py(Burry) — bearish pattern scoring with its own KB- Agent KB system —
agent_kb.py, per-agent JSON (options_kb.json,belfort_kb.json, etc.), human-readable mirrors in.agents/KB/ - MSI overnight pipeline —
overnight-train.ps1,sync-datalake-from-nobara.ps1,docs/MSI_TRAINING_ARCHITECTURE.mdfor training on the swarm worker while the VPS trades
Pull lessons to your PC with pull-agent-kb.ps1. Push back after MSI fine-tune with push-agent-kb.ps1. Rollback bad env patches: scripts/hermes_rollback.py --list.
Pairs with dual-GPU Ollama on Nobara for local fine-tune — VPS trades, MSI learns, hub orchestrates.
What I’d do differently
I should have built Hermes before Phase 2. We pivoted to options with fresh guardrails but no memory — the same mistake as Phase 1. Letting an LLM auto-apply env patches scared me until we added allowlists, max 3 patches per run, red-day blocks on loosening, and deterministic rollback. That constraint design took longer than the DeepSeek integration.
Next step: weekend SFT on exported Hermes JSONL, deploy weights to Ollama on the MSI, and watch whether Burry’s tightened bear score actually reduces junk entries. Paper trading is discipline over dopamine — Hermes is how the stack remembers that lesson without me re-reading logs at midnight.