4. Bulk-import to Engramia¶
You have an NDJSON dump of OpenAI messages from step 3. This step writes them into Engramia as patterns under the correct scope.
Why not /v1/import?
POST /v1/import is the symmetric pair to GET /v1/export — it expects round-tripped Engramia exports ({version, key, data} records). For a fresh-data migration like this one, use Memory.learn() (Python) or POST /v1/learn (REST) instead — one call per user/assistant message pair.
Mapping decisions¶
You have to make three decisions before running the import:
a) Which scope does each Thread land in?¶
Pick one of:
- One project per assistant — clean separation, easiest to reason about. Recommended if you had 3-10 distinct assistants.
- One project per customer/tenant — best if your assistants are customer-facing and you need GDPR-grade isolation.
- One project total — only if your assistants were internal and you want everything in one shared recall pool.
Set ENGRAMIA_TENANT_ID and ENGRAMIA_PROJECT_ID accordingly when running the importer. (Or pass them as auth_context if you're calling Engramia via REST.)
b) What is eval_score for an imported message?¶
OpenAI doesn't tell you whether a past assistant message was good or bad. Three options:
| Strategy | Score | When |
|---|---|---|
| Neutral default | 5.0 |
Default. Patterns are recallable but won't dominate fresh, evaluated runs. |
| Heuristic from feedback | 7.0 if you have positive UX signal (thumbs-up, accepted suggestion); 3.0 if negative |
If you stored user feedback alongside thread_id. |
| Re-evaluate post-hoc | Run memory.evaluate() on each pair after import |
Costs one LLM call per pattern. Most accurate. Skip on first pass. |
Default to neutral (5.0) on the first import. You can re-evaluate later with engramia evaluate-imported (CLI) once the new system is live.
c) How are user/assistant pairs grouped into patterns?¶
Each Engramia pattern is (task, code, eval_score). The natural grouping is:
user message → pattern.task
next assistant → pattern.code (or pattern.output)
message in same eval_score = 5.0 (neutral)
thread
Multi-turn refinements (user, assistant, user, assistant) become two patterns, not one. This is intentional — recall ranks them independently.
The import script¶
# import_to_engramia.py — convert exported NDJSON into Engramia patterns
import json
import sys
from engramia import Memory
def pair_messages(records):
"""Group consecutive (user, assistant) pairs from one thread."""
by_thread: dict[str, list] = {}
for r in records:
by_thread.setdefault(r["thread_id"], []).append(r)
for tid, msgs in by_thread.items():
msgs.sort(key=lambda r: r["created_at"])
i = 0
while i < len(msgs) - 1:
if msgs[i]["role"] == "user" and msgs[i + 1]["role"] == "assistant":
yield msgs[i], msgs[i + 1]
i += 2
else:
i += 1
def main(ndjson_path: str):
memory = Memory() # picks up ENGRAMIA_* env vars
with open(ndjson_path) as f:
records = [json.loads(line) for line in f if line.strip()]
pairs = list(pair_messages(records))
print(f"Found {len(pairs)} (user, assistant) pairs across "
f"{len({r['thread_id'] for r in records})} threads")
imported = 0
for user_msg, asst_msg in pairs:
try:
memory.learn(
task=user_msg["text"][:2000], # truncate to safety limit
code=asst_msg["text"][:8000],
eval_score=5.0, # neutral — see step (b)
source="import",
run_id=f"openai-thread:{user_msg['thread_id']}",
on_duplicate="skip", # don't clobber later, evaluated patterns
)
imported += 1
except Exception as exc:
print(f"skip {user_msg['message_id']}: {exc}")
print(f"Imported {imported}/{len(pairs)} patterns")
if __name__ == "__main__":
main(sys.argv[1])
Run it:
export ENGRAMIA_TENANT_ID=t_abc
export ENGRAMIA_PROJECT_ID=p_main
python import_to_engramia.py threads.ndjson
REST variant¶
If you don't run Python in your import host, the same logic over REST:
curl -X POST https://api.engramia.dev/v1/learn \
-H "Authorization: Bearer $ENGRAMIA_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"task": "Build a CSV parser",
"code": "import csv\n...",
"eval_score": 5.0,
"source": "import",
"run_id": "openai-thread:thread_abc"
}'
One request per pattern. Rate-limited at ~30 req/sec on hosted plans — for >50k patterns, prefer the Python path.
Dedup¶
on_duplicate="skip" is recommended for backfills. Two reasons:
- If you re-run the importer, you don't want it to overwrite patterns the new system has since evaluated and improved.
- The Jaccard-0.92 dedup threshold catches near-duplicate user prompts ("build CSV parser" vs "build a CSV parser") — without it you'd accumulate noise.
Verify¶
The response shows pattern_count for the current scope. Should match imported from the script (within the dedup tolerance).
Next¶
Tools & files mapping — what to do about code_interpreter, file_search, and your function-calling tools.