Credential Storage Architecture (BYOK)¶
Engramia v0.7.0 · Classification: Public
Architecture specification for Bring Your Own Key (BYOK) credential storage. This document describes how Engramia stores, encrypts, resolves, and audits per-tenant LLM provider credentials.
1. Goals and non-goals¶
Goals¶
- Tenants supply their own LLM provider keys — Engramia never pays for LLM tokens consumed by a tenant
- At-rest encryption of API keys with master-key separation; no plaintext keys in the database
- Multi-provider support — OpenAI, Anthropic, Google Gemini, Ollama, OpenAI-compatible endpoints (Together/Groq/Fireworks/vLLM)
- Per-request resolution — credentials resolved from the authenticated tenant context, not from process-wide environment variables
- Graceful degradation — when no key is configured, fall back to Demo mode with a clear UX signal instead of failing
- Self-hosted parity — single-tenant self-hosters keep using
OPENAI_API_KEYenv vars; the BYOK layer is opt-in via env flag - Pluggable backend — local AES-GCM by default; HashiCorp Vault Transit / AWS KMS / GCP KMS / Azure Key Vault available for enterprise
Non-goals¶
- LLM cost metering or budgeting — that is the provider's billing dashboard concern (Anthropic Console, OpenAI Usage)
- Key vending — Engramia does not issue or rotate provider-side keys; tenant manages them in OpenAI/Anthropic/Google consoles
- Provider failover orchestration — single primary provider per request; secondary chain is a future Phase 6.6 #2 feature
- Key sharing across tenants — keys are strictly tenant-scoped, no cross-tenant reuse
- Memoising LLM responses to reduce cost — out of scope (would conflict with non-determinism of
evaluate)
2. System overview¶
┌────────────────────────────────────────────┐
Agent / Dashboard │ TRUST BOUNDARY │
──────HTTPS─────────▶│ Caddy (TLS 1.3) │
│ │ │
│ ▼ │
│ FastAPI (auth, rate-limit, body-size) │
│ │ │
│ ▼ │
│ Scope contextvar (tenant_id, project_id) │
│ │ │
│ ▼ │
│ Memory facade ── make_llm() ──┐ │
│ │ │
│ ▼ │
│ CredentialResolver │
│ │ │
│ ┌──────────────┼──────────────┐ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ LRU cache CredentialStore Demo│
│ (provider (DB + decrypt) Provider│
│ instances) │ (no key)│
│ │ ▼ │ │
│ │ AESGCMCipher │ │
│ │ │ │ │
│ │ ▼ │ │
│ │ tenant_credentials │ │
│ │ (encrypted_key, │ │
│ │ nonce, auth_tag) │ │
│ │ │ │ │
│ ▼ │ ▼ │
│ OpenAI/Anthropic/Gemini/Ollama Demo│
│ Provider instance (api_key resolved) output│
│ │ │ │
└───────────┼────────────────────────────┘ │
│ │
▼ │
┌────────────────────┐ │
│ LLM provider API │ ◀── tenant's quota │
│ (HTTPS outbound) │ │
└────────────────────┘ │
│
ENGRAMIA_CREDENTIALS_KEY (env, SOPS) ───────┘
Master key for AES-GCM decryption
Critical invariants:
- Master key (
ENGRAMIA_CREDENTIALS_KEY) lives only in the operator's environment (SOPS-encrypted on disk, never in DB) tenant_credentialsrows contain ciphertext + nonce + auth tag — useless without master key- Provider instances are cached per
(tenant_id, role), never globally - Cache is invalidated on credential update via
CredentialStore.invalidate(tenant_id)
3. Threat model¶
| Threat | Mitigation |
|---|---|
| Database dump leak | Keys stored as AES-256-GCM ciphertext. Without ENGRAMIA_CREDENTIALS_KEY (only in operator env), ciphertext is opaque. AAD {tenant_id}:{provider}:{purpose} prevents record substitution between tenants. |
| Master key leak from env | Operator rotates master key → re-encrypt batch via Alembic migration; old key_version rows are decrypted and re-saved with new version. Old master key is destroyed. |
| Backup leak (pg_dump) | Backups inherit ciphertext-only storage. Master key is not in DB → backup alone is not exploitable. Backups are encrypted at rest (Hetzner Storage Box) as a second layer. |
| Cross-tenant key access via API bug | CredentialStore queries always include WHERE tenant_id = :scope_tenant_id from contextvar. Authorization is at the boundary — require_auth sets scope before any handler runs. Test suite has test_cross_tenant_isolation.py that asserts no leakage. |
| Insider threat (Engramia operator reading keys) | Local backend: operator with both DB access AND ENGRAMIA_CREDENTIALS_KEY env can decrypt. Mitigated by separation: DB credentials and credentials master key live in separate SOPS files with separate audit logs (Ops/secrets/.env.prod.enc vs Ops/secrets/credentials-key.enc). Vault backend: operator never sees plaintext, only Vault Transit decrypt API does. |
| Key replay via stolen Bearer token | Bearer token rotation, revocation via DELETE /v1/keys/{id}, 60s TTL cache window. Tenant should rotate the LLM provider key (in their OpenAI/Anthropic console) if Engramia API key is compromised — Engramia cannot rotate provider-side keys for them. |
| Memory dump while process running | Plaintext keys exist only inside OpenAIProvider/AnthropicProvider instance attributes during a request. Python does not zero memory on object destruction; this is a known acceptable risk vs Vault transit, which would require an HTTPS round-trip per request. Vault backend mitigates this for Enterprise tier. |
| Side-channel: timing of "key valid?" checks | validator.py uses constant-time comparison; provider validation pings (/models endpoint) are rate-limited to 1/min per tenant to avoid amplification attacks. |
| Demo mode abuse (free LLM via shared infra) | DemoMeter enforces hard cap (50 calls/month per tenant). Demo responses are deterministic mocks, not real LLM calls — Engramia spends $0 on Demo. |
| Logging leak | Audit log records key_fingerprint (sk-...abcd, last 4 chars) only. Plaintext keys never enter logs, exception traces, or telemetry. Pre-commit hook checks for OPENAI_API_KEY=sk- patterns in code/configs. |
4. Data model¶
Database schema (Alembic migration 023_tenant_credentials)¶
CREATE TABLE tenant_credentials (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id TEXT NOT NULL REFERENCES cloud_users(tenant_id) ON DELETE CASCADE,
provider TEXT NOT NULL, -- openai | anthropic | gemini | ollama | openai_compat
purpose TEXT NOT NULL, -- llm | embedding | both
encrypted_key BYTEA NOT NULL, -- AES-256-GCM ciphertext
nonce BYTEA NOT NULL, -- 12 bytes (96 bits, GCM standard)
auth_tag BYTEA NOT NULL, -- 16 bytes (GCM tag)
key_version SMALLINT NOT NULL DEFAULT 1, -- master key rotation marker
key_fingerprint TEXT NOT NULL, -- "sk-...abcd" — last 4 chars for UI display
base_url TEXT, -- non-null for ollama / openai_compat
default_model TEXT, -- e.g. "gpt-4.1" — overridable per request
default_embed_model TEXT, -- e.g. "text-embedding-3-small"
role_models JSONB, -- Business+ tier: {"eval": "gpt-4.1-mini", "evolve": "claude-opus-4-7"}
status TEXT NOT NULL DEFAULT 'active', -- active | revoked | invalid
last_used_at TIMESTAMPTZ,
last_validated_at TIMESTAMPTZ,
last_validation_error TEXT, -- nullable; populated when status=invalid
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
created_by TEXT NOT NULL, -- cloud_users.id of creator
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);
CREATE UNIQUE INDEX uq_tenant_credentials_provider_purpose
ON tenant_credentials (tenant_id, provider, purpose);
CREATE INDEX ix_tenant_credentials_tenant ON tenant_credentials (tenant_id);
CREATE INDEX ix_tenant_credentials_status ON tenant_credentials (status) WHERE status != 'active';
Why purpose separately from provider:
A tenant may want OpenAI for embeddings (text-embedding-3-small is best-in-class) but Anthropic for LLM (Claude Sonnet for evaluate). The (provider, purpose) UNIQUE constraint allows two rows: (openai, embedding) and (anthropic, llm).
Why role_models is JSONB and not a separate table:
Per-role routing is a Business-tier feature that adds at most 5 entries per credential (eval, coder, architect, evolve, default). A separate table adds a join with no benefit; JSONB queryable via ->> operator if ever needed.
Pydantic models (engramia/credentials/models.py)¶
from datetime import datetime
from typing import Literal
from pydantic import BaseModel, Field
ProviderType = Literal["openai", "anthropic", "gemini", "ollama", "openai_compat"]
PurposeType = Literal["llm", "embedding", "both"]
StatusType = Literal["active", "revoked", "invalid"]
class TenantCredential(BaseModel):
"""In-memory representation of a tenant credential row.
The plaintext api_key is populated only after CredentialResolver decrypts;
serialisation excludes it (Pydantic field exclude=True).
"""
id: str
tenant_id: str
provider: ProviderType
purpose: PurposeType
api_key: str = Field(exclude=True) # plaintext, never serialised
key_fingerprint: str
base_url: str | None = None
default_model: str | None = None
default_embed_model: str | None = None
role_models: dict[str, str] = Field(default_factory=dict)
status: StatusType = "active"
last_used_at: datetime | None = None
last_validated_at: datetime | None = None
def model_for_role(self, role: str) -> str:
"""Resolve the model name for a logical role, falling back to default."""
return self.role_models.get(role) or self.default_model or _PROVIDER_DEFAULT_MODELS[self.provider]
class CredentialCreate(BaseModel):
"""API input — POST /v1/credentials body."""
provider: ProviderType
purpose: PurposeType
api_key: str = Field(min_length=8, max_length=512)
base_url: str | None = None # required for ollama / openai_compat
default_model: str | None = None
class CredentialPublicView(BaseModel):
"""API output — what /v1/credentials returns (no plaintext key).
Pydantic regenerates this from TenantCredential without api_key
via model_dump(exclude={"api_key"}).
"""
id: str
provider: ProviderType
purpose: PurposeType
key_fingerprint: str
base_url: str | None
default_model: str | None
status: StatusType
last_used_at: datetime | None
last_validated_at: datetime | None
created_at: datetime
5. Encryption design¶
Cipher choice: AES-256-GCM¶
| Property | Value | Rationale |
|---|---|---|
| Algorithm | AES-256-GCM | NIST-approved AEAD, widely audited, hardware-accelerated (AES-NI) |
| Key size | 256 bits | Meets PCI-DSS / HIPAA / FedRAMP at-rest requirements |
| Nonce size | 96 bits (12 bytes) | GCM standard; random per record (never reused with same key) |
| Auth tag size | 128 bits (16 bytes) | GCM standard |
| AAD (additional authenticated data) | f"{tenant_id}:{provider}:{purpose}".encode() |
Prevents an attacker who swaps encrypted_key between rows from passing decryption |
Library: cryptography>=42 (already a dependency for cloud_auth.py JWT). No new deps.
# engramia/credentials/crypto.py — pseudokód
from cryptography.hazmat.primitives.ciphers.aead import AESGCM
import os
class AESGCMCipher:
def __init__(self, master_key: bytes, key_version: int = 1) -> None:
if len(master_key) != 32:
raise ValueError("Master key must be 32 bytes (256 bits)")
self._aesgcm = AESGCM(master_key)
self.key_version = key_version
def encrypt(self, plaintext: str, aad: bytes) -> tuple[bytes, bytes, bytes]:
nonce = os.urandom(12)
ciphertext_with_tag = self._aesgcm.encrypt(nonce, plaintext.encode(), aad)
# cryptography library returns ciphertext || tag concatenated
ciphertext, auth_tag = ciphertext_with_tag[:-16], ciphertext_with_tag[-16:]
return ciphertext, nonce, auth_tag
def decrypt(self, ciphertext: bytes, nonce: bytes, auth_tag: bytes, aad: bytes) -> str:
ciphertext_with_tag = ciphertext + auth_tag
plaintext = self._aesgcm.decrypt(nonce, ciphertext_with_tag, aad)
return plaintext.decode()
Master key management¶
| Aspect | Design |
|---|---|
| Storage | Env var ENGRAMIA_CREDENTIALS_KEY — base64-encoded 32 bytes |
| Source of truth | SOPS-encrypted Ops/secrets/credentials-key.enc (separate from .env.prod.enc) |
| Generation | python -c "import os, base64; print(base64.b64encode(os.urandom(32)).decode())" |
| Rotation cadence | On compromise, on key custodian change, or yearly best-practice |
| Rotation procedure | Alembic migration 024_rotate_master_key: read all rows, decrypt with old key, encrypt with new key, increment key_version. Atomic per row, idempotent. |
| Backup | Operator MUST back up master key separately from DB backups. Loss of master key = permanent loss of all stored credentials (tenants must re-enter keys). |
Key version field¶
Allows zero-downtime rotation:
- Operator generates new key, sets
ENGRAMIA_CREDENTIALS_KEY_NEW(in addition to currentENGRAMIA_CREDENTIALS_KEY) - Engramia binary reads both, decrypts with version match (
key_version = 1→ old key,key_version = 2→ new key) - Background migration re-encrypts all
key_version = 1rows with new key, setskey_version = 2 - After migration completes, operator drops
ENGRAMIA_CREDENTIALS_KEYand renames_NEWto canonical
6. Provider abstraction¶
The existing ABC in engramia/providers/base.py:18 (LLMProvider.call(prompt, system, role)) is unchanged. BYOK changes only how providers are instantiated — they take api_key as a constructor parameter instead of reading from os.environ.
Existing → BYOK delta¶
# Before (existing, will be deprecated for cloud mode):
class OpenAIProvider(LLMProvider):
def __init__(self, model="gpt-4.1", max_retries=3, timeout=30.0):
# implicitly reads OPENAI_API_KEY from env via openai SDK
# After (BYOK):
class OpenAIProvider(LLMProvider):
def __init__(self, api_key: str, model="gpt-4.1", base_url: str | None = None,
max_retries=3, timeout=30.0):
self._client = OpenAI(api_key=api_key, base_url=base_url, timeout=timeout)
api_key defaulting to os.environ.get("OPENAI_API_KEY") is preserved for self-hosted single-tenant mode (ENGRAMIA_BYOK_ENABLED=false, see §11).
New providers¶
| File | Class | Status |
|---|---|---|
providers/gemini.py |
GeminiProvider, GeminiEmbeddings |
NEW — Google Gen AI SDK |
providers/ollama.py |
OllamaProvider (subclass of OpenAIProvider) |
NEW — base_url=http://host:11434/v1, longer timeouts, Authorization: Bearer ollama placeholder |
providers/openai_compat.py |
OpenAICompatProvider (subclass) |
NEW — generic for Together/Groq/Fireworks/vLLM with custom base_url |
providers/demo.py |
DemoProvider, DemoEmbeddings |
NEW — deterministic mocked responses, used when no credential exists |
All implement existing LLMProvider / EmbeddingProvider ABCs without modification.
7. Per-request resolution flow¶
HTTP request arrives
│
▼
require_auth (existing dependency)
│
├── extracts Bearer token / OIDC JWT / cloud JWT
├── resolves AuthContext(tenant_id, project_id, role)
└── set_scope(Scope(tenant_id, project_id)) # contextvar
│
▼
Route handler (e.g., POST /v1/evaluate)
│
▼
Memory.evaluate(...) → MultiEvaluator(num_evals=3)
│
▼
For each of 3 concurrent evaluations:
│
▼
LLMProvider.call(prompt, system, role="eval")
│
│ ↑ This LLMProvider instance was injected at Memory construction.
│ But Memory is built per-request via Depends(get_memory) which calls:
│
▼
get_memory():
return Memory(
storage=make_storage(), # PostgresStorage — scope-filtered
embeddings=make_embeddings(), # local or BYOK embed provider
llm=make_llm(), # ← THE BYOK ENTRYPOINT
)
│
▼
make_llm():
scope = get_scope() # contextvar
return _build_llm_for_tenant(scope.tenant_id, role="default")
│
▼
_build_llm_for_tenant(tenant_id, role) [LRU cached, 512 entries]:
cred = CredentialResolver.resolve(tenant_id, purpose="llm")
if cred is None:
return DemoProvider()
return _construct_provider(cred, role)
│
▼
CredentialResolver.resolve(tenant_id, purpose="llm"):
row = CredentialStore.get(tenant_id, purpose="llm")
if row is None or row.status != "active":
return None
aad = f"{row.tenant_id}:{row.provider}:{row.purpose}".encode()
api_key_plaintext = AESGCMCipher(_master_key).decrypt(
row.encrypted_key, row.nonce, row.auth_tag, aad
)
CredentialStore.touch_last_used(row.id) # async, fire-and-forget
return TenantCredential(api_key=api_key_plaintext, **row_fields)
│
▼
_construct_provider(cred, role):
model = cred.model_for_role(role)
if cred.provider == "openai":
return OpenAIProvider(api_key=cred.api_key, model=model, base_url=cred.base_url)
if cred.provider == "anthropic":
return AnthropicProvider(api_key=cred.api_key, model=model)
if cred.provider == "gemini":
return GeminiProvider(api_key=cred.api_key, model=model)
if cred.provider == "ollama":
return OllamaProvider(api_key="ollama", base_url=cred.base_url, model=model)
if cred.provider == "openai_compat":
return OpenAIProvider(api_key=cred.api_key, base_url=cred.base_url, model=model)
raise ProviderError(f"unknown provider: {cred.provider}")
LRU cache details¶
| Aspect | Value |
|---|---|
| Cache key | (tenant_id, role) |
| Cache value | Provider instance (holds plaintext api_key in memory) |
| Size | 512 entries (≈ 100 active tenants × 5 roles) |
| TTL | None — invalidation is event-driven via CredentialStore.invalidate(tenant_id) |
| Invalidation triggers | POST /v1/credentials, DELETE /v1/credentials/{id}, PATCH /v1/credentials/{id} |
| Implementation | functools.lru_cache wrapped with custom invalidation (cache_clear for the specific key) |
Why no TTL: TTL would force re-decryption every N minutes for active tenants, increasing CPU load. Event-driven invalidation is correct because credentials only change via tenant action, which we capture.
8. Demo mode¶
When CredentialResolver.resolve() returns None, make_llm() returns a DemoProvider. This applies to:
- New tenants who skipped "Add LLM key" in onboarding
- Tenants whose key was revoked or marked invalid
- Self-hosted single-tenant deployments without env keys (developer mode)
DemoProvider behaviour¶
class DemoProvider(LLMProvider):
DEMO_RESPONSES = {
"eval": json.dumps({
"task_alignment": 7, "code_quality": 7, "workspace_usage": 7,
"robustness": 6, "overall": 6.8,
"feedback": "DEMO MODE — add your LLM API key in Settings → LLM Providers to get real evaluations."
}),
"default": "DEMO RESPONSE — add your LLM API key in Settings → LLM Providers to enable real LLM features."
}
def call(self, prompt: str, system: str | None = None, role: str = "default") -> str:
scope = get_scope()
if not DemoMeter.try_increment(scope.tenant_id):
raise QuotaExceededError(
"Demo mode quota exhausted (50 calls/month). "
"Add your LLM API key to continue."
)
return self.DEMO_RESPONSES.get(role, self.DEMO_RESPONSES["default"])
DemoMeter¶
| Aspect | Value |
|---|---|
| Backing store | New table demo_call_meter (tenant_id, year_month, count) — or reuse existing usage_counters table |
| Cap | 50 calls/month per tenant |
| Reset | Calendar month boundary (UTC) |
| Failure on cap | HTTP 429 with error_code=DEMO_QUOTA_EXCEEDED, hint to add real key |
UI signaling¶
API responses include extra fields when in demo mode:
{
"median_score": 6.8,
"variance": 0.0,
"feedback": "DEMO MODE — ...",
"_meta": {
"mode": "demo",
"demo_calls_used": 12,
"demo_calls_limit": 50,
"upgrade_link": "https://app.engramia.dev/settings/llm-providers"
}
}
Dashboard reads _meta.mode == "demo" and shows persistent yellow banner: "🟡 Demo mode — eval results are simulated. [Add LLM key]".
9. API surface¶
Endpoints live under /v1/credentials/ (new file engramia/api/credentials.py, mounted in app.py).
| Method | Path | Auth | Purpose |
|---|---|---|---|
POST |
/v1/credentials |
admin+ | Create or replace credential for (provider, purpose) |
GET |
/v1/credentials |
admin+ | List all credentials for current tenant (no plaintext) |
GET |
/v1/credentials/{id} |
admin+ | Get single credential metadata (no plaintext) |
PATCH |
/v1/credentials/{id} |
admin+ | Update default_model, role_models, base_url (NOT api_key — use POST to replace) |
DELETE |
/v1/credentials/{id} |
admin+ | Soft-delete (status=revoked) — preserves audit trail |
POST |
/v1/credentials/{id}/validate |
admin+ | Ping provider's /models endpoint to verify key still works; rate-limited 1/min/tenant |
Key rules at the API boundary¶
api_keyfield is write-only — never appears in any response, even to the creator. This forces tenants to store their key in their own password manager, not rely on Engramia as a key vault.POST /v1/credentialswith existing(tenant_id, provider, purpose)triple replaces the previous row (UPSERT). The previouskey_fingerprintis logged in audit before replacement.PATCHcannot changeapi_key. To rotate, the tenant POSTs a new key, which UPSERTs.GETresponses includekey_fingerprint(sk-...abcd) so the tenant can identify which key is active without seeing the plaintext.
Validation flow¶
POST /v1/credentials
│
▼
1. Pydantic validates input shape
2. Tier gate: provider in {openai, anthropic, gemini, openai_compat} → all tiers
provider == ollama → all tiers (with ⚠️ "use-at-own-risk" warning header)
3. Optional: synchronous validation ping to provider.list_models()
- If pings succeeds: status=active
- If pings fails (401/403): reject with 400 "Invalid API key"
- If pings is rate-limited or 5xx: status=active anyway, warn user
4. Encrypt with AESGCMCipher
5. INSERT ... ON CONFLICT (tenant_id, provider, purpose) DO UPDATE
6. Invalidate LRU cache for tenant_id
7. Audit log: CREDENTIAL_CREATED with key_fingerprint
8. Return 201 with CredentialPublicView
10. Failure modes¶
| Scenario | Behaviour |
|---|---|
| Master key env var unset on startup | API refuses to start (RuntimeError: ENGRAMIA_CREDENTIALS_KEY required when ENGRAMIA_BYOK_ENABLED=true). Fail fast. |
| Master key wrong (decryption fails) | All LLM calls fall through to DemoProvider with a critical-level audit log "MASTER_KEY_DECRYPT_FAILURE". Operator alerts fire. Tenants see degraded service, not data loss. |
| Tenant's key revoked at provider side | First call returns provider's 401 → CredentialStore.mark_invalid(id, error="401 Unauthorized") → subsequent calls fall through to DemoProvider. Email notification to tenant admin. |
| Tenant's key over quota at provider side | Provider returns 429 → propagate to caller as 502 "LLM provider rate-limited" (do NOT mark credential invalid — quota will reset). |
| Tenant deletes credential mid-request | Cache holds provider instance until current request completes; next request rebuilds → DemoProvider. No mid-flight failure. |
| Two tenants share the same plaintext API key | Allowed (Engramia doesn't deduplicate). Each row has independent encryption with own nonce. Provider-side quota is shared (provider's problem, not Engramia's). |
| Credentials table corruption | New tenant requests fall through to demo. Existing cached provider instances continue working until cache eviction. Audit log captures the corruption event. |
cryptography lib InvalidTag (tampering / nonce reuse) |
Decryption raises cryptography.exceptions.InvalidTag → caught, logged as CREDENTIAL_TAMPERING_SUSPECTED, credential marked invalid, security alert fires. Pinpointed by AAD: an attacker swapping rows between tenants would fail tag check immediately. |
11. Self-hosted vs. cloud mode¶
BYOK is opt-in via env flag. Self-hosted single-tenant deployments shouldn't have to set up master keys, manage credentials tables, or use the dashboard — they already have their key in OPENAI_API_KEY.
Mode selection (ENGRAMIA_BYOK_ENABLED)¶
| Mode | Default | Behaviour |
|---|---|---|
false (self-hosted, default) |
when ENGRAMIA_DATABASE_URL is unset OR cloud_users table is empty |
Existing path: make_llm() reads OPENAI_API_KEY / ANTHROPIC_API_KEY from env. tenant_credentials table is unused. |
true (cloud) |
enabled in Ops/.env.prod for api.engramia.dev |
New path: make_llm() resolves per-tenant credential from DB; falls back to DemoProvider if no row. Env vars OPENAI_API_KEY etc. are ignored at runtime (used only as bootstrap fallback for the default tenant in dev/staging). |
Hybrid mode (Enterprise self-hosted)¶
Multi-tenant self-hosted instances (rare — only for Enterprise customers running multiple internal product teams as separate tenants) can enable BYOK with a local master key. They are responsible for KMS integration and key rotation; documentation in Production Hardening.
12. Migration from current state¶
Today's state (v0.6.x): single-tenant cloud, server-side OPENAI_API_KEY in Ops/.env.prod is used for all tenants.
Migration plan:
| Step | Action | Risk |
|---|---|---|
| 1 | Deploy v0.7.0 with ENGRAMIA_BYOK_ENABLED=false (no behaviour change) |
None — old code path |
| 2 | Run Alembic migration 023_tenant_credentials (creates table, no data) |
None — additive only |
| 3 | Generate master key, store in Ops/secrets/credentials-key.enc (SOPS) |
None — not yet used |
| 4 | Deploy v0.7.1 with ENGRAMIA_CREDENTIALS_KEY set, but ENGRAMIA_BYOK_ENABLED=false |
None — flag still off |
| 5 | Email all existing tenants (n=0 today, low risk): "Add your LLM key in dashboard before YYYY-MM-DD or your account will switch to demo mode" | UX risk only |
| 6 | After 14-day grace, deploy v0.7.2 with ENGRAMIA_BYOK_ENABLED=true |
Tenants without keys land in demo mode |
| 7 | Remove OPENAI_API_KEY from Ops/.env.prod (it's no longer used) |
None — clean-up |
Self-hosters: unchanged path (ENGRAMIA_BYOK_ENABLED defaults to false). They need to do nothing.
Existing paying customers: zero (per project_byok_strategy.md memory). Clean break safe.
13. Audit and observability¶
Audit log events¶
| Event type | When | Detail fields |
|---|---|---|
CREDENTIAL_CREATED |
POST /v1/credentials succeeds | provider, purpose, key_fingerprint, base_url, default_model |
CREDENTIAL_REPLACED |
POST UPSERT replaces existing | old_key_fingerprint, new_key_fingerprint |
CREDENTIAL_DELETED |
DELETE /v1/credentials/{id} | provider, purpose, key_fingerprint |
CREDENTIAL_VALIDATED |
POST /v1/credentials/{id}/validate | provider, success |
CREDENTIAL_MARKED_INVALID |
First-call provider returned 401/403 | provider, error_message |
CREDENTIAL_DECRYPT_FAILURE |
AAD mismatch or cipher tag invalid | row_id, expected_aad |
MASTER_KEY_DECRYPT_FAILURE |
Cipher initialised but cannot decrypt any row | sample_row_id |
DEMO_MODE_FALLBACK |
First request for tenant with no active credential | tenant_id (rate-limited to 1/hour to avoid log spam) |
DEMO_QUOTA_EXCEEDED |
DemoMeter rejects 51st call of the month | tenant_id, calls_used |
All events go to the existing audit_log table (per governance/audit_scrubber.py retention rules).
Prometheus metrics¶
engramia_credentials_total{provider, status} gauge
engramia_credential_resolutions_total{tenant_tier, result} counter
result ∈ {hit, miss_demo, miss_invalid, error}
engramia_credential_cache_size gauge
engramia_credential_cache_hits_total counter
engramia_credential_cache_misses_total counter
engramia_demo_calls_total{tenant_tier} counter
engramia_master_key_failures_total counter
engramia_credential_validation_duration_seconds histogram
Health probe¶
Extend GET /v1/health/deep with credentials subsystem check:
{
"credentials": {
"status": "ok",
"master_key_loaded": true,
"active_credentials_count": 142,
"cache_size": 87
}
}
status=degraded if master_key_loaded=false (Engramia running but BYOK broken).
14. Future extensions¶
| Extension | Tier | Approach |
|---|---|---|
| HashiCorp Vault Transit backend | Enterprise | Replace AESGCMCipher with VaultTransitCipher calling Vault's /v1/transit/decrypt/engramia-credentials. Master key never leaves Vault. |
| AWS KMS / GCP KMS / Azure Key Vault | Enterprise | Same pattern as Vault — pluggable cipher backend. Selected via ENGRAMIA_CREDENTIALS_BACKEND={local,vault,aws_kms,gcp_kms,azure_kv}. |
| Per-role model routing | Business | role_models JSONB column populated via PATCH /v1/credentials/{id}/role-models. Resolver uses cred.model_for_role(role). |
| Provider failover chain | Business | tenant_credentials gets priority column; resolver tries primary, falls back to secondary on ProviderError. |
| Multi-region key replication | Enterprise | Vault Transit handles this natively; AWS KMS via multi-region keys. |
| Bring-your-own-master-key (BYOMK) | Enterprise | Tenant supplies their own master key via Vault namespace; Engramia decrypts via tenant's Vault, not operator's. |
| Provider quota / cost surfacing | Pro+ | Optional: scrape provider's billing API (OpenAI Usage, Anthropic Console) and surface in dashboard. Tenant grants Engramia read-only access to their billing API. |
15. Implementation effort summary¶
Per Phase 6.6 in the operator roadmap (Ops repo, private):
| Component | Effort |
|---|---|
engramia/credentials/ package (5 modules) |
4 d |
Alembic migration 023_tenant_credentials |
1 d |
providers/{gemini,ollama,demo}.py |
2.5 d |
Refactor _factory.py to per-tenant cache |
2 d |
API endpoints /v1/credentials/* |
2 d |
Dashboard UI /settings/llm-providers |
3 d |
| Onboarding "Skip for now" + demo banner | 1 d |
| Documentation (this file + provider setup guides) | 2 d |
| Stripe pricing migration to 5-tier | 0.5 d |
Total BYOK foundation: ~14 days of focused work. Tier-gated features (Hosted MCP, per-role routing, cross-agent memory, Vault backend, etc.) follow per the internal pricing-tier roadmap (Ops repo, private).
See also¶
- Security Architecture — overall trust model, RBAC, multi-tenancy
- Production Hardening — TLS, secret management, rate limiting
- Environment Variables —
ENGRAMIA_BYOK_ENABLED,ENGRAMIA_CREDENTIALS_KEY,ENGRAMIA_CREDENTIALS_BACKEND - Pricing — tier feature matrix, BYOK availability per tier