Python API Reference¶

All public API is accessed through the Memory class.

from engramia import Memory

Memory¶

Constructor¶

Memory(
    llm: LLMProvider | None = None,
    embeddings: EmbeddingProvider,
    storage: StorageBackend,
)

Parameter	Required	Description
`llm`	No	LLM provider for evaluate, compose, evolve. `None` = learn/recall only.
`embeddings`	Yes	Embedding provider for semantic search
`storage`	Yes	Storage backend (JSON or PostgreSQL)

learn()¶

mem.learn(
    task: str,
    code: str,
    eval_score: float,
    output: str | None = None,
) -> LearnResult

Store a successful agent run as a success pattern.

Parameter	Type	Description
`task`	`str`	What the agent was asked to do (max 10,000 chars)
`code`	`str`	The code/solution produced (max 500,000 chars)
`eval_score`	`float`	Quality rating, 0–10
`output`	`str \\| None`	Agent stdout/output (optional)

Returns: LearnResult with .stored (bool) and .pattern_count (int).

Raises: ValidationError for invalid inputs.

result = mem.learn(
    task="Parse CSV and compute statistics",
    code="import csv\nimport statistics\n...",
    eval_score=8.5,
    output="mean=42.3, std=7.1",
)

recall()¶

mem.recall(
    task: str,
    limit: int = 5,
    deduplicate: bool = True,
    eval_weighted: bool = True,
    recency_weight: float = 0.0,
    recency_half_life_days: float = 30.0,
) -> list[Match]

Find relevant success patterns for a new task via semantic search.

Parameter	Type	Default	Description
`task`	`str`	—	The task to search for
`limit`	`int`	`5`	Max results to return
`deduplicate`	`bool`	`True`	Group similar tasks (Jaccard > 0.7), return only top-scoring per group
`eval_weighted`	`bool`	`True`	Multiply similarity by eval quality multiplier [0.5, 1.0]
`recency_weight`	`float`	`0.0`	Bias toward recently-stored patterns via exponential half-life decay on `Pattern.timestamp`. `0.0` = off (no behaviour change), `1.0` = full decay, intermediate values soften the effect via `recency_factor ** recency_weight`. Multiplies with `eval_weighted` when both are active.
`recency_half_life_days`	`float`	`30.0`	Half-life of the recency decay, in days. A pattern this many days old contributes a `recency_factor` of 0.5; twice that, 0.25. Ignored when `recency_weight == 0`.

Returns: list[Match] sorted by effective score descending.

Each Match contains:

Field	Type	Description
`similarity`	`float`	Cosine similarity (0.0–1.0)
`effective_score`	`float \| None`	Rank-ordering score produced by recall when any non-similarity signal is active (`eval_weighted=True` and/or `recency_weight>0`); `None` on the plain similarity path
`reuse_tier`	`str`	`"duplicate"`, `"adapt"`, or `"fresh"`
`pattern_key`	`str`	Storage key for `delete_pattern()`
`pattern`	`Pattern`	Full pattern with task, design, success_score, reuse_count, timestamp

matches = mem.recall(task="Read CSV and calculate averages", limit=5)
for m in matches:
    print(f"{m.similarity:.2f} | {m.pattern.task}")

Recency-aware recall¶

For workloads where stale patterns should lose rank over time — codebase refactors, deprecated APIs, post-incident rewrites — pass recency_weight > 0:

# Prefer patterns stored in the last couple of weeks:
recent = mem.recall(
    task="Apply the current auth middleware",
    recency_weight=1.0,
    recency_half_life_days=14.0,
)

The blended formula is

recency_factor  = 0.5 ** (max(0, now - pattern.timestamp) / (H * 86400))
effective_score = similarity × quality_factor × recency_factor ** recency_weight

where quality_factor is the eval multiplier when eval_weighted=True (else 1) and H is recency_half_life_days. Future-dated timestamps (clock skew) are clamped to age=0 so they do not award a >1 boost, matching the behaviour of SuccessPatternStore.run_aging.

recency_weight=0.0 is a strict no-op and preserves pre-0.6.7 output byte-for-byte.

evaluate()¶

mem.evaluate(
    task: str,
    code: str,
    output: str | None = None,
    num_evals: int = 3,
    *,
    pattern_key: str | None = None,
) -> EvalResult

Run N independent LLM evaluations and aggregate results.

Requires: llm provider configured.

Parameter	Type	Default	Description
`task`	`str`	—	Task description
`code`	`str`	—	Code to evaluate
`output`	`str \\| None`	`None`	Agent output
`num_evals`	`int`	`3`	Number of parallel evaluations (1–10)
`pattern_key`	`str \\| None`	`None`	Pattern identifier to attach this evaluation to. When set, the result feeds directly into `eval_weighted` recall for that specific pattern — closing the learn → evaluate → improve loop. When `None` (default), the result is keyed by `sha256(code)[:12]`, preserving the pre-0.6.8 behaviour for free-floating code not tied to a stored pattern.

Returns: EvalResult with:

Field	Type	Description
`median_score`	`float`	Aggregated score (0–10)
`variance`	`float`	Score variance across runs
`high_variance`	`bool`	`True` if variance > 1.5
`feedback`	`str`	Feedback from the worst run
`adversarial_detected`	`bool`	`True` if hardcoded output detected

Raises: ProviderError if no LLM configured. ValidationError if pattern_key is provided but no pattern exists under that key.

# Evaluate a stored pattern — result feeds into its future recall ranking:
matches = mem.recall("Parse CSV", limit=1)
result = mem.evaluate(
    task="Parse CSV",
    code=matches[0].pattern.design["code"],
    pattern_key=matches[0].pattern_key,
)

# Or evaluate free-floating code — keyed by sha256(code):
result = mem.evaluate(task="Parse CSV", code=candidate_code)

refine_pattern()¶

mem.refine_pattern(
    pattern_key: str,
    eval_score: float,
    *,
    task: str | None = None,
    feedback: str = "",
) -> None

Record a new quality observation against an existing pattern without running an LLM evaluation. Appends an entry to the eval store so the next eval_weighted recall call picks up the updated evidence.

Typical callers: downstream task succeeded / failed; user rated a pattern via a UI; an offline eval pipeline produced a score and wants it reflected in the live memory.

Parameter	Type	Default	Description
`pattern_key`	`str`	—	The storage key of the pattern to refine. Obtain from `Match.pattern_key` on a prior `recall()`.
`eval_score`	`float`	—	New quality observation, `[0.0, 10.0]`. The eval-weighted multiplier reads the most recent observation for this key.
`task`	`str \\| None`	`None`	Optional task description attached to the eval record; defaults to the pattern's own `task` field.
`feedback`	`str`	`""`	Optional free-form note. Not consulted by ranking, but surfaces in `get_feedback` and evolution pipelines.

Returns: None. Raises: ValidationError if the key does not exist or the score is out of range.

Does not mutate Pattern.success_score — survival signals (reuse_count, success_score, aging) remain orthogonal to ranking. See concepts.md for the full survival-vs-ranking model.

# Downstream task used a pattern and succeeded — record positive evidence:
matches = mem.recall("Parse CSV", limit=1)
mem.refine_pattern(matches[0].pattern_key, 9.0, feedback="shipped to prod")

# The next recall eval-weighted call already sees the boost:
matches_after = mem.recall("Parse CSV", limit=1, eval_weighted=True)
# matches_after[0].effective_score is higher than before

compose()¶

mem.compose(task: str) -> Pipeline

Decompose a task into a staged pipeline from existing success patterns.

Requires: llm provider configured.

Returns: Pipeline with:

Field	Type	Description
`stages`	`list[Stage]`	Pipeline stages with task, reads, writes
`valid`	`bool`	Whether contract validation passed
`contract_errors`	`list[str]`	Validation errors (if any)

Raises: ProviderError if no LLM configured.

pipeline = mem.compose(task="Fetch data, analyze, write report")
for stage in pipeline.stages:
    print(f"[{stage.task}] reads={stage.reads} writes={stage.writes}")

get_feedback()¶

mem.get_feedback(
    task_type: str | None = None,
    limit: int = 5,
) -> list[str]

Get recurring feedback patterns for prompt injection.

Returns only feedback with count >= 2, sorted by frequency and freshness.

feedback = mem.get_feedback(limit=4)
# ["Add error handling for missing input files.", ...]

delete_pattern()¶

mem.delete_pattern(pattern_key: str) -> bool

Permanently delete a stored pattern. Returns True if the pattern existed.

matches = mem.recall(task="Parse CSV")
deleted = mem.delete_pattern(matches[0].pattern_key)

run_aging()¶

mem.run_aging() -> int

Apply time-decay to all success patterns. Returns the number of pruned patterns.

Decay: success_score *= 0.98 ^ weeks
Patterns with score < 0.1 are removed
Run periodically (e.g., weekly cron)

run_feedback_decay()¶

mem.run_feedback_decay() -> None

Apply time-decay to feedback clusters (10% per week).

metrics¶

mem.metrics -> Metrics

Current memory instance metrics.

Field	Type	Description
`runs`	`int`	Total recorded runs
`success_rate`	`float`	Proportion of successful runs
`avg_eval_score`	`float \\| None`	Average eval score
`pattern_count`	`int`	Current number of patterns
`pipeline_reuse`	`int`	Runs where an existing pattern was reused

evolve_prompt()¶

mem.evolve_prompt(
    role: str,
    current_prompt: str,
) -> EvolutionResult

Generate an improved prompt based on recurring quality issues.

Requires: llm provider configured.

result = mem.evolve_prompt(role="coder", current_prompt="You are a coder...")
if result.accepted:
    print(result.improved_prompt)

analyze_failures()¶

mem.analyze_failures(min_count: int = 1) -> list[FailureCluster]

Cluster recurring errors to identify systemic problems.

clusters = mem.analyze_failures(min_count=2)
for c in clusters:
    print(f"{c.representative} (count={c.total_count})")

register_skills() / find_by_skills()¶

mem.register_skills(pattern_key: str, skills: list[str]) -> None
mem.find_by_skills(required: list[str], match_all: bool = True) -> list[Match]

Tag patterns with capabilities and search by skills.

mem.register_skills(key, ["csv_parsing", "statistics"])
results = mem.find_by_skills(["csv_parsing"], match_all=True)

export() / import_data()¶

mem.export() -> list[dict]
mem.import_data(records: list[dict], overwrite: bool = False) -> int

Backup and migrate patterns (JSONL-compatible).

# Export
records = mem.export()

# Import into a new instance
imported = new_mem.import_data(records)
print(f"Imported {imported} patterns")

Exceptions¶

from engramia import EngramiaError, ProviderError, ValidationError, StorageError

Exception	When
`EngramiaError`	Base exception for all Engramia errors
`ProviderError`	LLM provider not configured or call failed
`ValidationError`	Invalid input (empty task, score out of range, etc.)
`StorageError`	Storage backend error (file I/O, database)

try:
    result = mem.evaluate(task, code)
except ProviderError:
    pass  # no LLM configured
except ValidationError:
    pass  # invalid input