7. Verification & rollback¶
Pre-cutover checklist, post-cutover smoke tests, and the exact rollback path if something breaks.
Pre-cutover checklist¶
Tick every box before flipping the read path in step 6.
Code¶
- [ ]
pip show engramiareports0.6.6(or your pinned version) on every host serving traffic. - [ ]
python -c "from engramia.sdk.openai_agents import EngramiaRunHooks, engramia_instructions; print('ok')"printsok. - [ ] Your agent's
instructions=argument usesengramia_instructions(memory, base=...), not a static string. - [ ] Every
Runner.run(...)call passeshooks=EngramiaRunHooks(memory).
Data¶
- [ ] Backfill from step 4 is complete —
GET /v1/metricsreportspattern_count≥ expected. - [ ] Spot-check 5 random patterns:
GET /v1/recall?task=<sample-user-prompt>&limit=3returns relevant results. - [ ] Pattern store size is realistic — if you imported 100k threads and got 100 patterns, the dedup threshold ate everything; investigate before continuing.
Scope & RBAC¶
- [ ]
ENGRAMIA_TENANT_IDandENGRAMIA_PROJECT_IDare set per service. Wrong scope = recall pulls patterns from the wrong customer. - [ ] At least one Engramia API key has the
editorrole (needed forlearn/import); user-facing services need onlyreader. - [ ] If you self-host, you have a backup of the
engramia_data/(JSON) or PostgreSQL DB taken today.
Observability¶
- [ ] Engramia health:
curl https://api.engramia.dev/v1/healthreturns200 ok. - [ ] Your error budget tracks
engramia_*metrics (pattern_count, avg_eval_score, success_rate). See Monitoring. - [ ] Compare-log volume from step 6 is non-zero for at least 7 consecutive days.
Post-cutover smoke tests¶
Run all five within 30 minutes of flipping each rollout step (5%, 25%, 50%, 100%).
1. End-to-end agent round-trip¶
import asyncio
from agents import Runner
# import your actual agent + hooks here
async def smoke():
result = await Runner.run(agent, "Hello, are you there?", hooks=hooks)
assert result.final_output, "empty response"
print(result.final_output)
asyncio.run(smoke())
Pass condition: response is non-empty and on-topic.
2. Recall returns relevant patterns¶
curl "https://api.engramia.dev/v1/recall?task=<a%20task%20you%20definitely%20backfilled>&limit=3" \
-H "Authorization: Bearer $ENGRAMIA_API_KEY"
Pass condition: response contains 1-3 matches with reuse_tier of duplicate or adapt.
3. Learn writes back¶
After running step 1's smoke, immediately GET /v1/metrics again and verify pattern_count increased by 1. If it did not, EngramiaRunHooks is not wired correctly — recheck Runner.run(..., hooks=hooks).
4. Audit log records the cutover¶
Pass condition: recent events show learn actions with source=api. If everything is source=import, no live traffic is reaching Engramia.
5. No quota exhaustion¶
Pass condition: pattern_count_used is well below pattern_count_limit. If you're at 95%, the bulk import overshot — either upgrade the plan or run scoped deletion of the lowest-eval-score patterns. See Pricing.
Rollback¶
If any smoke test fails or quality drops in the first 24 hours after a rollout step:
Stage 1 — Flip the flag back¶
That's the entire rollback for the read path. Threads still exist on the OpenAI side until 26 August 2026, so reads continue to work.
Stage 2 — Stop the shadow writes (only if Engramia itself is the problem)¶
Engramia data accumulated during dual-write is preserved — you can resume the cutover later without re-importing.
Stage 3 — Re-import only if pattern store is corrupt¶
If the issue is that imported patterns are wrong (bad scope, bad chunking, polluted with PII), purge and re-import:
# Scope-bounded delete, dry run first:
curl -X POST "https://api.engramia.dev/v1/governance/delete-scope?dry_run=true" \
-H "Authorization: Bearer $ENGRAMIA_API_KEY"
# Then real:
curl -X POST "https://api.engramia.dev/v1/governance/delete-scope" \
-H "Authorization: Bearer $ENGRAMIA_API_KEY"
Then re-run step 4 with the fix.
You're done¶
Once the 100% rollout has been live for 7 days and step 6 stage 4 is complete (no new Threads being created), the migration is complete. Some teams keep the dual-write running past 26 August 2026 just to capture audit-log evidence; that's optional and costs you a small amount of OpenAI quota.
Now ship.
If something in this guide didn't match what you saw, please open an issue at github.com/engramia/engramia/issues — migration docs decay fastest, and we want to know.