Skip to content

Python API Reference

All public API is accessed through the Memory class.

from engramia import Memory

Memory

Constructor

Memory(
    llm: LLMProvider | None = None,
    embeddings: EmbeddingProvider,
    storage: StorageBackend,
)
Parameter Required Description
llm No LLM provider for evaluate, compose, evolve. None = learn/recall only.
embeddings Yes Embedding provider for semantic search
storage Yes Storage backend (JSON or PostgreSQL)

learn()

mem.learn(
    task: str,
    code: str,
    eval_score: float,
    output: str | None = None,
) -> LearnResult

Store a successful agent run as a success pattern.

Parameter Type Description
task str What the agent was asked to do (max 10,000 chars)
code str The code/solution produced (max 500,000 chars)
eval_score float Quality rating, 0–10
output str \| None Agent stdout/output (optional)

Returns: LearnResult with .stored (bool) and .pattern_count (int).

Raises: ValidationError for invalid inputs.

result = mem.learn(
    task="Parse CSV and compute statistics",
    code="import csv\nimport statistics\n...",
    eval_score=8.5,
    output="mean=42.3, std=7.1",
)

recall()

mem.recall(
    task: str,
    limit: int = 5,
    deduplicate: bool = True,
    eval_weighted: bool = True,
    recency_weight: float = 0.0,
    recency_half_life_days: float = 30.0,
) -> list[Match]

Find relevant success patterns for a new task via semantic search.

Parameter Type Default Description
task str The task to search for
limit int 5 Max results to return
deduplicate bool True Group similar tasks (Jaccard > 0.7), return only top-scoring per group
eval_weighted bool True Multiply similarity by eval quality multiplier [0.5, 1.0]
recency_weight float 0.0 Bias toward recently-stored patterns via exponential half-life decay on Pattern.timestamp. 0.0 = off (no behaviour change), 1.0 = full decay, intermediate values soften the effect via recency_factor ** recency_weight. Multiplies with eval_weighted when both are active.
recency_half_life_days float 30.0 Half-life of the recency decay, in days. A pattern this many days old contributes a recency_factor of 0.5; twice that, 0.25. Ignored when recency_weight == 0.

Returns: list[Match] sorted by effective score descending.

Each Match contains:

Field Type Description
similarity float Cosine similarity (0.0–1.0)
effective_score float | None Rank-ordering score produced by recall when any non-similarity signal is active (eval_weighted=True and/or recency_weight>0); None on the plain similarity path
reuse_tier str "duplicate", "adapt", or "fresh"
pattern_key str Storage key for delete_pattern()
pattern Pattern Full pattern with task, design, success_score, reuse_count, timestamp
matches = mem.recall(task="Read CSV and calculate averages", limit=5)
for m in matches:
    print(f"{m.similarity:.2f} | {m.pattern.task}")

Recency-aware recall

For workloads where stale patterns should lose rank over time — codebase refactors, deprecated APIs, post-incident rewrites — pass recency_weight > 0:

# Prefer patterns stored in the last couple of weeks:
recent = mem.recall(
    task="Apply the current auth middleware",
    recency_weight=1.0,
    recency_half_life_days=14.0,
)

The blended formula is

recency_factor  = 0.5 ** (max(0, now - pattern.timestamp) / (H * 86400))
effective_score = similarity × quality_factor × recency_factor ** recency_weight

where quality_factor is the eval multiplier when eval_weighted=True (else 1) and H is recency_half_life_days. Future-dated timestamps (clock skew) are clamped to age=0 so they do not award a >1 boost, matching the behaviour of SuccessPatternStore.run_aging.

recency_weight=0.0 is a strict no-op and preserves pre-0.6.7 output byte-for-byte.


evaluate()

mem.evaluate(
    task: str,
    code: str,
    output: str | None = None,
    num_evals: int = 3,
    *,
    pattern_key: str | None = None,
) -> EvalResult

Run N independent LLM evaluations and aggregate results.

Requires: llm provider configured.

Parameter Type Default Description
task str Task description
code str Code to evaluate
output str \| None None Agent output
num_evals int 3 Number of parallel evaluations (1–10)
pattern_key str \| None None Pattern identifier to attach this evaluation to. When set, the result feeds directly into eval_weighted recall for that specific pattern — closing the learn → evaluate → improve loop. When None (default), the result is keyed by sha256(code)[:12], preserving the pre-0.6.8 behaviour for free-floating code not tied to a stored pattern.

Returns: EvalResult with:

Field Type Description
median_score float Aggregated score (0–10)
variance float Score variance across runs
high_variance bool True if variance > 1.5
feedback str Feedback from the worst run
adversarial_detected bool True if hardcoded output detected

Raises: ProviderError if no LLM configured. ValidationError if pattern_key is provided but no pattern exists under that key.

# Evaluate a stored pattern — result feeds into its future recall ranking:
matches = mem.recall("Parse CSV", limit=1)
result = mem.evaluate(
    task="Parse CSV",
    code=matches[0].pattern.design["code"],
    pattern_key=matches[0].pattern_key,
)

# Or evaluate free-floating code — keyed by sha256(code):
result = mem.evaluate(task="Parse CSV", code=candidate_code)

refine_pattern()

mem.refine_pattern(
    pattern_key: str,
    eval_score: float,
    *,
    task: str | None = None,
    feedback: str = "",
) -> None

Record a new quality observation against an existing pattern without running an LLM evaluation. Appends an entry to the eval store so the next eval_weighted recall call picks up the updated evidence.

Typical callers: downstream task succeeded / failed; user rated a pattern via a UI; an offline eval pipeline produced a score and wants it reflected in the live memory.

Parameter Type Default Description
pattern_key str The storage key of the pattern to refine. Obtain from Match.pattern_key on a prior recall().
eval_score float New quality observation, [0.0, 10.0]. The eval-weighted multiplier reads the most recent observation for this key.
task str \| None None Optional task description attached to the eval record; defaults to the pattern's own task field.
feedback str "" Optional free-form note. Not consulted by ranking, but surfaces in get_feedback and evolution pipelines.

Returns: None. Raises: ValidationError if the key does not exist or the score is out of range.

Does not mutate Pattern.success_score — survival signals (reuse_count, success_score, aging) remain orthogonal to ranking. See concepts.md for the full survival-vs-ranking model.

# Downstream task used a pattern and succeeded — record positive evidence:
matches = mem.recall("Parse CSV", limit=1)
mem.refine_pattern(matches[0].pattern_key, 9.0, feedback="shipped to prod")

# The next recall eval-weighted call already sees the boost:
matches_after = mem.recall("Parse CSV", limit=1, eval_weighted=True)
# matches_after[0].effective_score is higher than before

compose()

mem.compose(task: str) -> Pipeline

Decompose a task into a staged pipeline from existing success patterns.

Requires: llm provider configured.

Returns: Pipeline with:

Field Type Description
stages list[Stage] Pipeline stages with task, reads, writes
valid bool Whether contract validation passed
contract_errors list[str] Validation errors (if any)

Raises: ProviderError if no LLM configured.

pipeline = mem.compose(task="Fetch data, analyze, write report")
for stage in pipeline.stages:
    print(f"[{stage.task}] reads={stage.reads} writes={stage.writes}")

get_feedback()

mem.get_feedback(
    task_type: str | None = None,
    limit: int = 5,
) -> list[str]

Get recurring feedback patterns for prompt injection.

Returns only feedback with count >= 2, sorted by frequency and freshness.

feedback = mem.get_feedback(limit=4)
# ["Add error handling for missing input files.", ...]

delete_pattern()

mem.delete_pattern(pattern_key: str) -> bool

Permanently delete a stored pattern. Returns True if the pattern existed.

matches = mem.recall(task="Parse CSV")
deleted = mem.delete_pattern(matches[0].pattern_key)

run_aging()

mem.run_aging() -> int

Apply time-decay to all success patterns. Returns the number of pruned patterns.

  • Decay: success_score *= 0.98 ^ weeks
  • Patterns with score < 0.1 are removed
  • Run periodically (e.g., weekly cron)

run_feedback_decay()

mem.run_feedback_decay() -> None

Apply time-decay to feedback clusters (10% per week).


metrics

mem.metrics -> Metrics

Current memory instance metrics.

Field Type Description
runs int Total recorded runs
success_rate float Proportion of successful runs
avg_eval_score float \| None Average eval score
pattern_count int Current number of patterns
pipeline_reuse int Runs where an existing pattern was reused

evolve_prompt()

mem.evolve_prompt(
    role: str,
    current_prompt: str,
) -> EvolutionResult

Generate an improved prompt based on recurring quality issues.

Requires: llm provider configured.

result = mem.evolve_prompt(role="coder", current_prompt="You are a coder...")
if result.accepted:
    print(result.improved_prompt)

analyze_failures()

mem.analyze_failures(min_count: int = 1) -> list[FailureCluster]

Cluster recurring errors to identify systemic problems.

clusters = mem.analyze_failures(min_count=2)
for c in clusters:
    print(f"{c.representative} (count={c.total_count})")

register_skills() / find_by_skills()

mem.register_skills(pattern_key: str, skills: list[str]) -> None
mem.find_by_skills(required: list[str], match_all: bool = True) -> list[Match]

Tag patterns with capabilities and search by skills.

mem.register_skills(key, ["csv_parsing", "statistics"])
results = mem.find_by_skills(["csv_parsing"], match_all=True)

export() / import_data()

mem.export() -> list[dict]
mem.import_data(records: list[dict], overwrite: bool = False) -> int

Backup and migrate patterns (JSONL-compatible).

# Export
records = mem.export()

# Import into a new instance
imported = new_mem.import_data(records)
print(f"Imported {imported} patterns")

Exceptions

from engramia import EngramiaError, ProviderError, ValidationError, StorageError
Exception When
EngramiaError Base exception for all Engramia errors
ProviderError LLM provider not configured or call failed
ValidationError Invalid input (empty task, score out of range, etc.)
StorageError Storage backend error (file I/O, database)
try:
    result = mem.evaluate(task, code)
except ProviderError:
    pass  # no LLM configured
except ValidationError:
    pass  # invalid input