Python API Reference¶
All public API is accessed through the Memory class.
Memory¶
Constructor¶
| Parameter | Required | Description |
|---|---|---|
llm |
No | LLM provider for evaluate, compose, evolve. None = learn/recall only. |
embeddings |
Yes | Embedding provider for semantic search |
storage |
Yes | Storage backend (JSON or PostgreSQL) |
learn()¶
Store a successful agent run as a success pattern.
| Parameter | Type | Description |
|---|---|---|
task |
str |
What the agent was asked to do (max 10,000 chars) |
code |
str |
The code/solution produced (max 500,000 chars) |
eval_score |
float |
Quality rating, 0–10 |
output |
str \| None |
Agent stdout/output (optional) |
Returns: LearnResult with .stored (bool) and .pattern_count (int).
Raises: ValidationError for invalid inputs.
result = mem.learn(
task="Parse CSV and compute statistics",
code="import csv\nimport statistics\n...",
eval_score=8.5,
output="mean=42.3, std=7.1",
)
recall()¶
mem.recall(
task: str,
limit: int = 5,
deduplicate: bool = True,
eval_weighted: bool = True,
recency_weight: float = 0.0,
recency_half_life_days: float = 30.0,
) -> list[Match]
Find relevant success patterns for a new task via semantic search.
| Parameter | Type | Default | Description |
|---|---|---|---|
task |
str |
— | The task to search for |
limit |
int |
5 |
Max results to return |
deduplicate |
bool |
True |
Group similar tasks (Jaccard > 0.7), return only top-scoring per group |
eval_weighted |
bool |
True |
Multiply similarity by eval quality multiplier [0.5, 1.0] |
recency_weight |
float |
0.0 |
Bias toward recently-stored patterns via exponential half-life decay on Pattern.timestamp. 0.0 = off (no behaviour change), 1.0 = full decay, intermediate values soften the effect via recency_factor ** recency_weight. Multiplies with eval_weighted when both are active. |
recency_half_life_days |
float |
30.0 |
Half-life of the recency decay, in days. A pattern this many days old contributes a recency_factor of 0.5; twice that, 0.25. Ignored when recency_weight == 0. |
Returns: list[Match] sorted by effective score descending.
Each Match contains:
| Field | Type | Description |
|---|---|---|
similarity |
float |
Cosine similarity (0.0–1.0) |
effective_score |
float | None |
Rank-ordering score produced by recall when any non-similarity signal is active (eval_weighted=True and/or recency_weight>0); None on the plain similarity path |
reuse_tier |
str |
"duplicate", "adapt", or "fresh" |
pattern_key |
str |
Storage key for delete_pattern() |
pattern |
Pattern |
Full pattern with task, design, success_score, reuse_count, timestamp |
matches = mem.recall(task="Read CSV and calculate averages", limit=5)
for m in matches:
print(f"{m.similarity:.2f} | {m.pattern.task}")
Recency-aware recall¶
For workloads where stale patterns should lose rank over time — codebase
refactors, deprecated APIs, post-incident rewrites — pass
recency_weight > 0:
# Prefer patterns stored in the last couple of weeks:
recent = mem.recall(
task="Apply the current auth middleware",
recency_weight=1.0,
recency_half_life_days=14.0,
)
The blended formula is
recency_factor = 0.5 ** (max(0, now - pattern.timestamp) / (H * 86400))
effective_score = similarity × quality_factor × recency_factor ** recency_weight
where quality_factor is the eval multiplier when eval_weighted=True
(else 1) and H is recency_half_life_days. Future-dated timestamps
(clock skew) are clamped to age=0 so they do not award a >1 boost,
matching the behaviour of SuccessPatternStore.run_aging.
recency_weight=0.0 is a strict no-op and preserves pre-0.6.7 output
byte-for-byte.
evaluate()¶
mem.evaluate(
task: str,
code: str,
output: str | None = None,
num_evals: int = 3,
*,
pattern_key: str | None = None,
) -> EvalResult
Run N independent LLM evaluations and aggregate results.
Requires: llm provider configured.
| Parameter | Type | Default | Description |
|---|---|---|---|
task |
str |
— | Task description |
code |
str |
— | Code to evaluate |
output |
str \| None |
None |
Agent output |
num_evals |
int |
3 |
Number of parallel evaluations (1–10) |
pattern_key |
str \| None |
None |
Pattern identifier to attach this evaluation to. When set, the result feeds directly into eval_weighted recall for that specific pattern — closing the learn → evaluate → improve loop. When None (default), the result is keyed by sha256(code)[:12], preserving the pre-0.6.8 behaviour for free-floating code not tied to a stored pattern. |
Returns: EvalResult with:
| Field | Type | Description |
|---|---|---|
median_score |
float |
Aggregated score (0–10) |
variance |
float |
Score variance across runs |
high_variance |
bool |
True if variance > 1.5 |
feedback |
str |
Feedback from the worst run |
adversarial_detected |
bool |
True if hardcoded output detected |
Raises: ProviderError if no LLM configured. ValidationError if pattern_key is provided but no pattern exists under that key.
# Evaluate a stored pattern — result feeds into its future recall ranking:
matches = mem.recall("Parse CSV", limit=1)
result = mem.evaluate(
task="Parse CSV",
code=matches[0].pattern.design["code"],
pattern_key=matches[0].pattern_key,
)
# Or evaluate free-floating code — keyed by sha256(code):
result = mem.evaluate(task="Parse CSV", code=candidate_code)
refine_pattern()¶
mem.refine_pattern(
pattern_key: str,
eval_score: float,
*,
task: str | None = None,
feedback: str = "",
) -> None
Record a new quality observation against an existing pattern without
running an LLM evaluation. Appends an entry to the eval store so the
next eval_weighted recall call picks up the updated evidence.
Typical callers: downstream task succeeded / failed; user rated a pattern via a UI; an offline eval pipeline produced a score and wants it reflected in the live memory.
| Parameter | Type | Default | Description |
|---|---|---|---|
pattern_key |
str |
— | The storage key of the pattern to refine. Obtain from Match.pattern_key on a prior recall(). |
eval_score |
float |
— | New quality observation, [0.0, 10.0]. The eval-weighted multiplier reads the most recent observation for this key. |
task |
str \| None |
None |
Optional task description attached to the eval record; defaults to the pattern's own task field. |
feedback |
str |
"" |
Optional free-form note. Not consulted by ranking, but surfaces in get_feedback and evolution pipelines. |
Returns: None. Raises: ValidationError if the key does not exist or the score is out of range.
Does not mutate Pattern.success_score — survival signals
(reuse_count, success_score, aging) remain orthogonal to ranking.
See concepts.md for the full survival-vs-ranking model.
# Downstream task used a pattern and succeeded — record positive evidence:
matches = mem.recall("Parse CSV", limit=1)
mem.refine_pattern(matches[0].pattern_key, 9.0, feedback="shipped to prod")
# The next recall eval-weighted call already sees the boost:
matches_after = mem.recall("Parse CSV", limit=1, eval_weighted=True)
# matches_after[0].effective_score is higher than before
compose()¶
Decompose a task into a staged pipeline from existing success patterns.
Requires: llm provider configured.
Returns: Pipeline with:
| Field | Type | Description |
|---|---|---|
stages |
list[Stage] |
Pipeline stages with task, reads, writes |
valid |
bool |
Whether contract validation passed |
contract_errors |
list[str] |
Validation errors (if any) |
Raises: ProviderError if no LLM configured.
pipeline = mem.compose(task="Fetch data, analyze, write report")
for stage in pipeline.stages:
print(f"[{stage.task}] reads={stage.reads} writes={stage.writes}")
get_feedback()¶
Get recurring feedback patterns for prompt injection.
Returns only feedback with count >= 2, sorted by frequency and freshness.
delete_pattern()¶
Permanently delete a stored pattern. Returns True if the pattern existed.
run_aging()¶
Apply time-decay to all success patterns. Returns the number of pruned patterns.
- Decay:
success_score *= 0.98 ^ weeks - Patterns with score < 0.1 are removed
- Run periodically (e.g., weekly cron)
run_feedback_decay()¶
Apply time-decay to feedback clusters (10% per week).
metrics¶
Current memory instance metrics.
| Field | Type | Description |
|---|---|---|
runs |
int |
Total recorded runs |
success_rate |
float |
Proportion of successful runs |
avg_eval_score |
float \| None |
Average eval score |
pattern_count |
int |
Current number of patterns |
pipeline_reuse |
int |
Runs where an existing pattern was reused |
evolve_prompt()¶
Generate an improved prompt based on recurring quality issues.
Requires: llm provider configured.
result = mem.evolve_prompt(role="coder", current_prompt="You are a coder...")
if result.accepted:
print(result.improved_prompt)
analyze_failures()¶
Cluster recurring errors to identify systemic problems.
clusters = mem.analyze_failures(min_count=2)
for c in clusters:
print(f"{c.representative} (count={c.total_count})")
register_skills() / find_by_skills()¶
mem.register_skills(pattern_key: str, skills: list[str]) -> None
mem.find_by_skills(required: list[str], match_all: bool = True) -> list[Match]
Tag patterns with capabilities and search by skills.
mem.register_skills(key, ["csv_parsing", "statistics"])
results = mem.find_by_skills(["csv_parsing"], match_all=True)
export() / import_data()¶
Backup and migrate patterns (JSONL-compatible).
# Export
records = mem.export()
# Import into a new instance
imported = new_mem.import_data(records)
print(f"Imported {imported} patterns")
Exceptions¶
| Exception | When |
|---|---|
EngramiaError |
Base exception for all Engramia errors |
ProviderError |
LLM provider not configured or call failed |
ValidationError |
Invalid input (empty task, score out of range, etc.) |
StorageError |
Storage backend error (file I/O, database) |