Concepts & Architecture¶
Overview¶
Engramia is a memory layer that sits under any AI agent framework. It provides a closed-loop learning system where agents improve over time by learning from past runs.
┌─────────────────────────────────────┐
│ Your Agent Framework │
│ (LangChain, CrewAI, custom...) │
└──────────────┬──────────────────────┘
│
┌───────▼───────┐
│ Engramia │
│ Memory │
└───────┬───────┘
│
┌──────────┼──────────┐
▼ ▼ ▼
┌──────┐ ┌────────┐ ┌────────┐
│ LLM │ │Embed- │ │Storage │
│ │ │dings │ │ │
└──────┘ └────────┘ └────────┘
Core loop¶
The fundamental Engramia loop is:
- Learn — After an agent run, store the task, code, eval score, and output as a success pattern
- Recall — Before a new run, find relevant past patterns via semantic search
- Evaluate — Score the output with multiple independent LLM evaluators
- Improve — Inject recurring feedback into prompts, evolve prompts automatically
Over time, patterns with high eval scores float to the top, while stale patterns decay and get pruned.
Key concepts¶
Success patterns¶
A pattern is a record of a successful agent run:
- task — what the agent was asked to do
- design — the code/solution produced
- success_score — quality rating (0–10)
- reuse_count — how many times this pattern has been recalled
- timestamp — when it was stored
Patterns are the primary unit of memory. They are stored with embeddings for semantic search and subject to time-based decay.
Pattern aging¶
Patterns lose relevance over time. Engramia applies 2% decay per week to success scores:
When a score drops below 0.1, the pattern is pruned. This ensures the memory stays fresh — new, high-quality patterns naturally replace old ones.
Reuse boost¶
When a pattern is recalled and used, its reuse count increments and its effective score gets a +0.1 boost (capped at 10.0). Frequently useful patterns survive longer.
Eval-weighted search¶
When recalling patterns, similarity scores are multiplied by an eval quality multiplier in the range [0.5, 1.0]:
- Patterns with consistently high eval scores get a multiplier close to 1.0
- Patterns with low or no eval history get 0.75 (neutral)
- This means high-quality patterns rank higher even if their embedding similarity is slightly lower
Feedback clustering¶
Engramia tracks recurring quality issues from evaluations. Feedback strings are clustered using Jaccard similarity (threshold > 0.4). When a feedback cluster reaches count >= 2, it becomes available for injection into prompts.
Feedback also decays at 10% per week, so transient issues fade while persistent problems stay visible.
Contract validation¶
When composing multi-stage pipelines, each stage declares what data it reads and writes. Engramia validates:
- Every input a stage reads must be produced by a prior stage (or be an initial input)
- No circular dependencies exist in the data flow
- The pipeline forms a valid DAG
Multi-eval scoring¶
Instead of a single LLM evaluation, Engramia runs N independent evaluations in parallel:
- Results are aggregated using the median (robust to outliers)
- Variance > 1.5 triggers a warning — evaluators disagree significantly
- Feedback comes from the worst run (most useful for improvement)
- Adversarial detection catches hardcoded outputs
Architecture¶
engramia/
├── memory.py # Memory facade (public API)
├── types.py # Pydantic models (Pattern, Match, EvalResult, ...)
├── exceptions.py # EngramiaError hierarchy
├── _util.py # Shared utilities
├── _factory.py # Provider factory (REST API + MCP)
│
├── core/ # Internal stores
│ ├── success_patterns.py # Pattern storage, aging, reuse tracking
│ ├── eval_store.py # Eval results, quality multiplier
│ ├── eval_feedback.py # Feedback clustering + decay
│ ├── metrics.py # Run statistics
│ └── skill_registry.py # Capability-based tagging
│
├── reuse/ # Reuse engine
│ ├── matcher.py # Semantic search + eval weighting
│ ├── composer.py # LLM pipeline decomposition
│ └── contracts.py # Data-flow validation + cycle detection
│
├── eval/
│ └── evaluator.py # MultiEvaluator (concurrent, median, variance)
│
├── providers/ # Pluggable backends
│ ├── base.py # ABC: LLMProvider, EmbeddingProvider, StorageBackend
│ ├── openai.py # OpenAI LLM + embeddings
│ ├── anthropic.py # Anthropic/Claude LLM
│ ├── local_embeddings.py # sentence-transformers (no API key)
│ ├── json_storage.py # JSON storage (thread-safe, atomic writes)
│ └── postgres.py # PostgreSQL + pgvector (HNSW index)
│
├── api/ # REST API
│ ├── app.py # FastAPI app factory
│ ├── routes.py # All endpoints
│ ├── auth.py # Bearer token middleware
│ ├── middleware.py # Security headers, rate limiting, body size
│ ├── audit.py # Structured audit logging
│ ├── deps.py # Dependency injection
│ └── schemas.py # Request/response models
│
├── evolution/ # Self-improvement
│ ├── prompt_evolver.py # LLM-based prompt improvement
│ └── failure_cluster.py # Failure pattern clustering
│
├── sdk/ # Framework integrations
│ ├── langchain.py # LangChain callback
│ └── webhook.py # HTTP SDK client
│
├── cli/ # CLI (Typer + Rich)
│ └── main.py
│
├── mcp/ # MCP server
│ └── server.py
│
└── db/ # Database
├── models.py # SQLAlchemy models
└── migrations/ # Alembic migrations
Provider abstraction¶
Engramia uses abstract base classes for all external dependencies:
LLMProvider—generate(prompt) -> str. Used by evaluate, compose, evolve.EmbeddingProvider—embed(texts) -> list[list[float]]. Used by learn and recall.StorageBackend—save/load/delete/search_similar. JSON or PostgreSQL.
This means you can swap providers without changing any application code.
Origin¶
Engramia was extracted from Agent Factory V2 — a self-improving AI agent factory. The factory remains as an open-source reference implementation proving the memory system works in practice.