Self-hosted memory stack

Your AI never forgets. Until you tell it to.

A self-hosted memory and multi-agent orchestration layer for power users who think in systems, not prompts. Hermes Agent + Honcho memory + Kanban CI/CD — all on your infra.

Hermes Agent Honcho Memory LLM Wiki Skills System Kanban CI/CD Coolify Deploy Multi-Profile Self-Hosted Session Intelligence Dialectic Reasoning Hermes Agent Honcho Memory LLM Wiki Skills System Kanban CI/CD Coolify Deploy Multi-Profile Self-Hosted Session Intelligence Dialectic Reasoning

Three-layer operational stack

Agent → Memory → Orchestration. Every layer runs on your infrastructure, not ours.

A

Agent Layer

Hermes Agent CLI + Gateway. Routes prompts through profiles, applies skills, manages tool dispatch. Single binary, no cloud dependency.

M

Memory Layer

Honcho two-layer context injection. Base (session summary, user model, peer card) plus dialectic supplement at configurable depths.

O

Orchestration Layer

Kanban multi-agent dispatcher with SQLite backend. Fan-out, dependencies, retries — full lifecycle for autonomous agent pipelines.

Two layers, independent cadences

Context is injected at API-call time into the user message (not system prompt), preserving prompt caching while keeping memory fresh.

Base Context layer 1

Session summary, user model, peer card, AI identity — injected every API call into the user message. Captures who you are and what you're doing.

Dialectic Supplement layer 2

Multi-pass .chat() reasoning at configurable depth: single pass, audit+synthesis, or full audit+synthesis+reconciliation. Thinks before it remembers.

Recall Mode config

Three strategies: context (hidden, automatic), tools (visible, on-demand), or hybrid (both). Twist the dial, not the architecture.

Honcho Tools
01
honcho_profile
Retrieve or update the peer card — name, role, preferences, communication style
read/write
02
honcho_search
Semantic search over stored context, ranked by relevance. No LLM synthesis.
read
03
honcho_context
Full session context snapshot — summary, peer card, recent messages
read
04
honcho_reasoning
Natural language query with synthesized answer at configurable depth
think
05
honcho_conclude
Write persistent facts. Self-healing — incorrect conclusions fade over time.
write

Four strategies, priority resolution

Multiple AI identities share one user workspace. Each builds its own observations — sessions are isolated by design.

Multi-Profile Isolation default

Multiple AI profiles (researcher, coder, writer) share one user workspace. Each profile maintains its own observation space — no context leakage between roles. Gateway platforms always enforce per-chat isolation regardless of strategy setting (priority 3 — highest).

Per-Directory

Sessions scoped to workspace directories. Different projects get different memory contexts automatically.

Per-Repo

Git repository-based isolation. Switch between codebases; memory follows the repo boundary.

Per-Session

Fresh context every time. No cross-session memory bleed — ideal for one-shot tasks or testing.

Global

Unified memory across all sessions. Persistent across every conversation — the AI remembers everything everywhere.

LLM Wiki + Skills System

A persistent knowledge base that the AI reads and writes. 52 pages spanning entities, concepts, and comparisons.

52
Wiki Pages
12
Entities
34+
Concepts
6
Sections

Wiki Structure

  • raw/ — Immutable sources, never modified
  • entities/ — People, projects, tokens, protocols
  • concepts/ — Technical concepts, APIs, guides
  • comparisons/ — Side-by-side evaluations
  • queries/ — Saved analyses from past questions
  • talk/ — Discussions and planning

YAML frontmatter · [[wikilinks]] · Source citations inline · index.md + log.md

Skills System

  • SKILL.md YAML format with frontmatter
  • Bundle grouping for related skills
  • Conditional loading per task type
  • Injected into user message (preserves prompt caching)
hermes-agent kanban-worker frontend-design research writing-plans deep-research service-now

Skill Paths

  • ~/.hermes/skills/ — System skills
  • ~/.hermes/skill-bundles/ — Grouped bundles
  • ~/.hermes/hermes-agent/skills/ — Agent-internal

Multi-agent pipeline from board to cloud

Six-step worker lifecycle with structured handoffs. SQLite-backed, no external dependencies.

1. Orient
kanban_show — read task state, context, parent outputs
2. Work
Inside $HERMES_KANBAN_WORKSPACE — build, code, research
3. Heartbeat
Keep-alive for long operations (training, crawling, encoding)
4. Block
Human-in-the-loop when decisions are needed
5. Complete
Structured handoff with summary + metadata
6. Fan-out
Create child tasks — don't scope-creep, delegate

Workspace Kinds

Scratch Fresh tmp dir, GC'd on archive
Dir Shared persistent directory
Worktree Git worktree, commit on finish

Lifecycle Statuses

Ready Awaiting dispatch
Running Worker active
Blocked Waiting on human
Done Completed with handoff

Note: Kanban manages the agent build pipeline; Coolify handles the final HTTPS deployment as a separate release step.

Self-hosted core. Public-safe surface.

The operational stack is designed around local control and layered isolation. This public page exposes architecture patterns only — not private memory, credentials, or raw session data.

🛡

Multi-Profile Isolation

Separate observation spaces per profile. Your researcher doesn't see your coder's context, and vice versa.

🔒

Gateway Isolation

Per-chat isolation always enforced at the gateway layer. Platform channels never leak across conversations.

🎯

Directional Observation

observationMode: directional as default — the AI only records what's directly relevant, not everything.

Never Public

API keys & credentials Internal IPs, ports, hostnames Peer cards & session history Trading positions & P&L Wiki raw/ source documents Task IDs & run IDs Database connection strings Wallet & account addresses