Skip to content

1. Concept mapping

Before writing any migration code, get a clear picture of how Assistants API primitives map onto Engramia's. Most map cleanly. Where they don't, you usually get something stronger (eval-weighted ranking, RBAC, audit log).

The mapping table

OpenAI Assistants Engramia Why it differs
Thread (per-user message history) Scope (tenant_id, project_id) Engramia scopes are multi-user-aware. One Thread maps to one project; many users can share that project under RBAC.
Message (linear, append-only) Pattern (task, code, eval_score) Patterns are eval-weighted and decayable. Recall ranks them by outcome quality, not chronological position.
File (vector store per assistant) Embedding (pgvector + HNSW) Files in Engramia are scope-aware — one upload, many readers. No per-assistant rebind.
Run (transient request state) Async job (DB-backed queue) Runs in Engramia survive restarts. Use Prefer: respond-async to opt in.
Tool (assistant-bound, JSON schema) Skill (cross-tenant searchable) Skills are first-class entities you can search, evaluate, and reuse across tenants — not bound to one assistant.
Assistant (instructions + tools + model) Agent (Agents SDK) + Engramia memory The agent definition stays in your code. Engramia replaces the persistence layer underneath it.

Three concrete consequences

a) Scopes are not Threads

OpenAI's Thread is a single-user, single-conversation container. Engramia's scope is a multi-user, multi-session boundary that enforces RBAC and audit logging.

  • One Engramia project ≈ one product feature, one team, or one customer (depending on your isolation model).
  • Within a scope, there is no "conversation" object — patterns are recalled by semantic similarity to the current task, not by who said what when.
  • If you genuinely need per-user isolation, run one tenant or project per user. The hosted plans size for this.

b) Recall replaces "load the thread"

In Assistants:

thread = client.beta.threads.retrieve(thread_id)
# all messages get re-sent on the next run

In Engramia:

matches = memory.recall(task="Build a CSV parser", limit=3)
# only the top-3 most relevant patterns are injected

The default recall window is 3 patterns, ranked by (similarity × success_score × recency_decay). You can override limit and add min_score filters. See Recall API.

c) Files are not files

OpenAI files are blobs you upload and bind to one assistant's vector store. Engramia uses embeddings — vectors stored in pgvector with an HNSW index.

  • You upload text content (or extracted text from PDFs/images), not raw binary.
  • One embedding can be recalled by any agent in the same scope.
  • Storage cost is per-vector, not per-file. A 50-page PDF chunked into 100 vectors costs the same as 100 short notes.

If your assistant relied on multimodal (image/audio) files, you'll need to either pre-extract text or use the multimodal-capable provider directly (the OpenAI Agents SDK supports this) — Engramia stores text embeddings only.

Next

Now that the primitives line up in your head, see Cutover code for the actual diff.