Data Handling¶
Engramia v0.6.0 · Classification: Internal
What Data Engramia Stores¶
Engramia stores agent execution patterns — structured records created when an agent successfully completes a task. It does not store end-user personal data by design. The data model:
| Field | Type | Description |
|---|---|---|
task |
string | Task description provided by the agent system |
code / design |
string | Solution produced by the agent (code, plan, or text) |
eval_score |
float 0–10 | Quality score from the multi-evaluator |
output |
string (optional) | Agent stdout/result |
success_score |
float | Decaying quality score (aging applied weekly) |
reuse_count |
int | Number of times this pattern was recalled and reused |
run_id |
string (optional) | Caller-supplied run identifier for tracing |
classification |
enum | Data classification: PUBLIC / INTERNAL / CONFIDENTIAL |
source |
string (optional) | System that created this pattern |
author |
string (optional) | Agent or user identifier |
expires_at |
datetime (optional) | Explicit expiry override |
redacted |
bool | Whether PII was detected and redacted before storage |
Additionally stored per-pattern: - Embedding vector (1536 dimensions for text-embedding-3-small) — used for semantic search only; not human-readable - Feedback records — recurring quality issues extracted from eval output (text only, no PII) - ROI events — anonymised recall/learn events for analytics (no task content, only metadata) - Audit log entries — security events (see security-architecture.md)
Where Data Is Stored¶
JSON storage (development)¶
Data is written to the local filesystem under ENGRAMIA_DATA_PATH (default: ./engramia_data) as JSON files. Not suitable for production with sensitive data.
PostgreSQL + pgvector (production)¶
Tables created by Alembic migrations:
| Table | Contents |
|---|---|
memory_data |
Pattern records (all fields above) |
memory_embeddings |
pgvector columns for ANN search |
tenants |
Tenant registry with retention policies |
projects |
Project registry with retention and classification defaults |
api_keys |
Hashed API keys with role and quota |
audit_log |
Security event log |
jobs |
Async job queue |
analytics_events |
ROI event stream (rolling 10 000 events per scope) |
All tables include tenant_id and project_id columns. Queries are always scoped.
Data Lifecycle¶
Retention¶
Default retention: 365 days from last update. Configurable per-tenant and per-project:
Pattern expires_at (if set) takes precedence over project/tenant retention.
Retention cleanup is a scheduled async job (retention_cleanup) that marks expired patterns for deletion. Run via POST /v1/governance/retention/apply or automatically via the job queue.
Aging (quality decay)¶
Separately from retention, patterns decay in quality over time:
- success_score *= 0.98^weeks_since_created
- Patterns with success_score < 0.1 are pruned automatically by run_aging()
- This is a quality control mechanism, not a privacy/compliance mechanism
Deletion¶
Per-pattern: DELETE /v1/patterns/{key} — immediate, hard delete.
Per-project (GDPR Art. 17 right to erasure): DELETE /v1/governance/projects/{id}
- Cascades: pattern records + embeddings → jobs → audit_log scrub (detail field set to NULL) → API keys revoked
- Returns a DeletionResult with per-type counts
Per-tenant: DELETE /v1/governance/tenants/{id} — same cascade, all projects under tenant
Data Portability (GDPR Art. 20)¶
Export all patterns for the current scope as NDJSON:
Each record includes all pattern fields plus governance metadata. Records can be re-imported via POST /import or Memory.import_data().
PII Detection and Redaction¶
The RedactionPipeline (opt-in) scans pattern content before storage for:
- Email addresses
- IPv4 addresses
- JWT tokens
- OpenAI/Anthropic API keys
- AWS access keys
- GitHub tokens
- Hex secrets ≥ 32 characters
- Keyword-prefixed secrets (password=, token=, secret=, key=)
When PII is found:
1. The content is replaced with [REDACTED] before storage
2. The redacted=true flag is set on the pattern
3. An PII_REDACTED audit event is logged
4. The caller receives a redacted: true field in the API response
Enable per-instance:
from engramia.governance.redaction import RedactionPipeline
mem = Memory(..., redaction=RedactionPipeline.default())
Data Classification¶
Each pattern can be assigned a classification:
| Level | Meaning | Default |
|---|---|---|
PUBLIC |
Safe to share, no restrictions | — |
INTERNAL |
Internal use only | Project default |
CONFIDENTIAL |
Sensitive, restricted access | — |
Classification is set at learn time or updated via PUT /v1/governance/patterns/{key}/classify. The export endpoint can filter by classification.
Sub-processors¶
| Provider | Data shared | Purpose | Region |
|---|---|---|---|
| OpenAI (opt-in) | Task + code content | LLM evaluation and embeddings | US |
| Anthropic (opt-in) | Task + code content | LLM evaluation | US |
| Hetzner Cloud | All data at rest | VM hosting | DE (FSN1) |
Data shared with LLM providers is governed by their respective DPAs. When using the local embeddings provider (sentence-transformers), no data is sent externally for embedding generation.
Security Controls for Data¶
| Control | Implementation |
|---|---|
| Encryption in transit | TLS 1.2+ (Caddy) for all API traffic; HTTPS for LLM API calls |
| Encryption at rest | Host-level (Hetzner disk encryption — see deployment guide) |
| Access control | RBAC (4 roles) + tenant/project scope isolation |
| Audit trail | Structured JSON audit log for all data access/mutation events |
| Data minimisation | Only data explicitly provided by the caller is stored |
| Right to erasure | DELETE /v1/governance/projects/{id} (GDPR Art. 17) |
| Data portability | GET /v1/governance/export (GDPR Art. 20) |
| Retention limits | Configurable TTL per tenant/project; default 365 days |
Backup and Recovery¶
See deployment.md for pg_dump procedures and RTO/RPO targets.
Short summary:
- RTO: 4 hours (VM restore from snapshot + pg_restore)
- RPO: 24 hours (daily pg_dump to off-site storage)
- Recommended: configure automated daily pg_dump to Hetzner Object Storage (S3-compatible)