Self-hosted memory stack

Your AI never forgets. Until you tell it to.

A self-hosted memory and multi-agent orchestration layer for power users who think in systems, not prompts. Hermes Agent + Honcho memory + Kanban CI/CD — all on your infra.

Explore the stack → Privacy first

◆ Hermes Agent ◆ Honcho Memory ◆ LLM Wiki ◆ Skills System ◆ Kanban CI/CD ◆ Coolify Deploy ◆ Multi-Profile ◆ Self-Hosted ◆ Session Intelligence ◆ Dialectic Reasoning ◆ Hermes Agent ◆ Honcho Memory ◆ LLM Wiki ◆ Skills System ◆ Kanban CI/CD ◆ Coolify Deploy ◆ Multi-Profile ◆ Self-Hosted ◆ Session Intelligence ◆ Dialectic Reasoning

Architecture

Three-layer operational stack

Agent → Memory → Orchestration. Every layer runs on your infrastructure, not ours.

Agent Layer

Hermes Agent CLI + Gateway. Routes prompts through profiles, applies skills, manages tool dispatch. Single binary, no cloud dependency.

▲

Memory Layer

Honcho two-layer context injection. Base (session summary, user model, peer card) plus dialectic supplement at configurable depths.

▲

Orchestration Layer

Kanban multi-agent dispatcher with SQLite backend. Fan-out, dependencies, retries — full lifecycle for autonomous agent pipelines.

Memory System

Two layers, independent cadences

Context is injected at API-call time into the user message (not system prompt), preserving prompt caching while keeping memory fresh.

Base Context layer 1

Session summary, user model, peer card, AI identity — injected every API call into the user message. Captures who you are and what you're doing.

Dialectic Supplement layer 2

Multi-pass .chat() reasoning at configurable depth: single pass, audit+synthesis, or full audit+synthesis+reconciliation. Thinks before it remembers.

Recall Mode config

Three strategies: context (hidden, automatic), tools (visible, on-demand), or hybrid (both). Twist the dial, not the architecture.

Honcho Tools

honcho_profile

Retrieve or update the peer card — name, role, preferences, communication style

read/write

honcho_search

Semantic search over stored context, ranked by relevance. No LLM synthesis.

read

honcho_context

Full session context snapshot — summary, peer card, recent messages

read

honcho_reasoning

Natural language query with synthesized answer at configurable depth

think

honcho_conclude

Write persistent facts. Self-healing — incorrect conclusions fade over time.

write

Session Intelligence

Four strategies, priority resolution

Multiple AI identities share one user workspace. Each builds its own observations — sessions are isolated by design.

Multi-Profile Isolation default

Multiple AI profiles (researcher, coder, writer) share one user workspace. Each profile maintains its own observation space — no context leakage between roles. Gateway platforms always enforce per-chat isolation regardless of strategy setting (priority 3 — highest).

Per-Directory

Sessions scoped to workspace directories. Different projects get different memory contexts automatically.

Per-Repo

Git repository-based isolation. Switch between codebases; memory follows the repo boundary.

Per-Session

Fresh context every time. No cross-session memory bleed — ideal for one-shot tasks or testing.

Global

Unified memory across all sessions. Persistent across every conversation — the AI remembers everything everywhere.

Knowledge Base

LLM Wiki + Skills System

A persistent knowledge base that the AI reads and writes. 52 pages spanning entities, concepts, and comparisons.

Wiki Pages

Entities

34+

Concepts

Sections

Wiki Structure

raw/ — Immutable sources, never modified
entities/ — People, projects, tokens, protocols
concepts/ — Technical concepts, APIs, guides
comparisons/ — Side-by-side evaluations
queries/ — Saved analyses from past questions
talk/ — Discussions and planning

YAML frontmatter · [[wikilinks]] · Source citations inline · index.md + log.md

Skills System

SKILL.md YAML format with frontmatter
Bundle grouping for related skills
Conditional loading per task type
Injected into user message (preserves prompt caching)

hermes-agent kanban-worker frontend-design research writing-plans deep-research service-now

Skill Paths

~/.hermes/skills/ — System skills
~/.hermes/skill-bundles/ — Grouped bundles
~/.hermes/hermes-agent/skills/ — Agent-internal

Deployment & Orchestration

Multi-agent pipeline from board to cloud

Six-step worker lifecycle with structured handoffs. SQLite-backed, no external dependencies.

1. Orient

kanban_show — read task state, context, parent outputs

2. Work

Inside $HERMES_KANBAN_WORKSPACE — build, code, research

3. Heartbeat

Keep-alive for long operations (training, crawling, encoding)

4. Block

Human-in-the-loop when decisions are needed

5. Complete

Structured handoff with summary + metadata

6. Fan-out

Create child tasks — don't scope-creep, delegate

Workspace Kinds

Scratch Fresh tmp dir, GC'd on archive

Dir Shared persistent directory

Worktree Git worktree, commit on finish

Lifecycle Statuses

Ready Awaiting dispatch

Running Worker active

Blocked Waiting on human

Done Completed with handoff

Note: Kanban manages the agent build pipeline; Coolify handles the final HTTPS deployment as a separate release step.

Privacy & Security

Self-hosted core. Public-safe surface.

The operational stack is designed around local control and layered isolation. This public page exposes architecture patterns only — not private memory, credentials, or raw session data.

🛡

Multi-Profile Isolation

Separate observation spaces per profile. Your researcher doesn't see your coder's context, and vice versa.

🔒

Gateway Isolation

Per-chat isolation always enforced at the gateway layer. Platform channels never leak across conversations.

🎯

Directional Observation

observationMode: directional as default — the AI only records what's directly relevant, not everything.

Never Public

API keys & credentials Internal IPs, ports, hostnames Peer cards & session history Trading positions & P&L Wiki raw/ source documents Task IDs & run IDs Database connection strings Wallet & account addresses