MCP Server¶

Engramia speaks the Model Context Protocol (MCP) over two transports:

stdio (default, self-host) — every tier, runs as a local subprocess. Suitable for Claude Desktop / Cursor / Windsurf installed on the developer's machine.
Streamable HTTP (hosted, Engramia Cloud) — Team tier and above. No local install; the MCP client connects directly to https://api.engramia.dev/v1/mcp with a Bearer API key.

Both transports expose the same tools (subject to per-tier filtering on the hosted side — see Tools below).

stdio transport (self-host)¶

Installation¶

pip install "engramia[openai,mcp]"

Running¶

engramia-mcp

MCP clients launch this binary as a subprocess automatically based on their config.

Client configuration¶

Claude Desktop¶

Config file location:

Linux/macOS: ~/.config/claude/claude_desktop_config.json
Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "engramia": {
      "command": "engramia-mcp",
      "env": {
        "ENGRAMIA_DATA_PATH": "/path/to/engramia_data",
        "OPENAI_API_KEY": "sk-..."
      }
    }
  }
}

Cursor / Windsurf¶

Use the same JSON format in the IDE's MCP server settings.

Hosted MCP transport (Engramia Cloud)¶

Tier requirement: Team plan or higher. Developer and Pro tiers cannot open hosted MCP sessions; the cloud returns HTTP 402 Payment Required.

Why use the hosted transport¶

No local Engramia install — the MCP client just needs an HTTP URL and a Bearer key. Whole-team rollouts (Claude Desktop deployed to N developers) become trivial.
Centralised pattern store — every team member's engramia_learn and engramia_recall hits the same backing storage as your REST API tenants use, so memories stay shared.
BYOK by default — the LLM keys used for engramia_evaluate and engramia_evolve are the tenant's own (configured in the dashboard under Credentials), not Engramia's. You're not paying for Engramia's inference budget.

Concurrent-session limits¶

Each tenant has a tier-defined cap on concurrent open MCP sessions. Sessions that go idle for 30 minutes are automatically closed and free a slot.

Tier	Max concurrent MCP sessions
Developer	— (no access)
Pro	— (no access)
Team	5
Business	25
Enterprise	100

When the cap is reached the cloud returns HTTP 429 with Retry-After: 60.

Client configuration¶

Claude Desktop¶

{
  "mcpServers": {
    "engramia": {
      "transport": "http",
      "url": "https://api.engramia.dev/v1/mcp",
      "headers": {
        "Authorization": "Bearer ENG_xxx"
      }
    }
  }
}

(Exact key names follow your MCP client's config schema; the example above matches Claude Desktop ≥ build 0.10 which supports HTTP transport. For older builds, fall back to stdio.)

Cursor / Windsurf / IDE plugins¶

Set the transport to "Streamable HTTP" (or equivalent) and point at https://api.engramia.dev/v1/mcp. The IDE will issue the MCP initialize handshake on startup; Engramia replies with the session ID, which the client carries on subsequent requests via the Mcp-Session-Id header.

Self-host the hosted transport¶

If you run Engramia on your own infrastructure and want to expose Streamable HTTP rather than stdio, set the feature flag:

ENGRAMIA_MCP_HOSTED_ENABLED=true
ENGRAMIA_MCP_SESSION_IDLE_SECONDS=1800     # 30 min default
ENGRAMIA_MCP_LIMITS_TEAM=5                 # override caps if desired
ENGRAMIA_MCP_LIMITS_BUSINESS=25
ENGRAMIA_MCP_LIMITS_ENTERPRISE=100

The /v1/mcp route mounts onto the same FastAPI app as the REST API. If you front Engramia with Caddy/nginx/another reverse proxy, ensure SSE buffering is disabled on that path — the Caddy snippet below is the reference:

@mcp path /v1/mcp /v1/mcp/*
handle @mcp {
    reverse_proxy {$UPSTREAM_API:engramia-api:8000} {
        flush_interval -1
        transport http {
            response_header_timeout 30m
            read_buffer 8KiB
        }
    }
}

Tools¶

The catalog is shared by both transports. On hosted MCP it is filtered per the caller's tier and RBAC role; on stdio it is exposed in full (self-host is single-tenant).

Tool	Description	RBAC	Hosted tier
`engramia_learn`	Store a successful run as a reusable pattern	editor+	Team+
`engramia_recall`	Semantic search over stored patterns	reader+	Team+
`engramia_evaluate`	N independent LLM evaluations, median + variance	editor+	Team+
`engramia_feedback`	Recurring quality issues for prompt injection	reader+	Team+
`engramia_metrics`	Run/pattern/reuse statistics	reader+	Team+
`engramia_aging`	Time-decay + prune stale patterns	editor+	Team+
`engramia_compose`	Decompose a task into a multi-agent pipeline (experimental)	editor+	Business+
`engramia_evolve`	Generate an improved system prompt from failure feedback	editor+	Business+
`engramia_analyze_failures`	Cluster recurring failure feedback into systemic issues	editor+	Business+

The hosted transport returns only the tools the caller is allowed to use in tools/list responses. Tools below the caller's tier are not surfaced at all (to keep the MCP client UI free of permanent "upgrade required" chrome). Discoverability for paywall upsell happens in the dashboard.

Configuration¶

Both transports use the same environment variables as the REST API:

Variable	Default	Description
`ENGRAMIA_STORAGE`	`json`	`json` or `postgres`
`ENGRAMIA_DATA_PATH`	`./engramia_data`	Path for JSON storage
`ENGRAMIA_DATABASE_URL`	—	PostgreSQL URL (postgres mode only)
`ENGRAMIA_LLM_PROVIDER`	`openai`	LLM provider
`ENGRAMIA_LLM_MODEL`	`gpt-4.1`	Model ID
`OPENAI_API_KEY`	—	OpenAI API key
`ENGRAMIA_MCP_HOSTED_ENABLED`	`false`	Mount hosted Streamable HTTP transport at `/v1/mcp`
`ENGRAMIA_MCP_SESSION_IDLE_SECONDS`	`1800`	Hosted session idle timeout
`ENGRAMIA_MCP_LIMITER_BACKEND`	`inmemory`	`inmemory` (default) or `redis` (planned)
`ENGRAMIA_MCP_LIMITS_TEAM`	`5`	Max concurrent hosted sessions for Team-tier tenants
`ENGRAMIA_MCP_LIMITS_BUSINESS`	`25`	Same, Business
`ENGRAMIA_MCP_LIMITS_ENTERPRISE`	`100`	Same, Enterprise

Usage example¶

Once configured, you can ask Claude Desktop / Cursor to use Engramia tools directly:

"Use engramia_learn to store this successful code with eval_score 8.5"

"Use engramia_recall to find patterns similar to 'parse CSV and compute statistics'"

"Use engramia_metrics to show current memory statistics"

The MCP tools accept the same parameters as the Python API — see API Reference for details.