ROI Score Calibration Guide¶
The composite ROI score (0-10) measures how much value Engramia's memory provides.
It is computed by the /v1/analytics/rollup endpoint for hourly, daily, or weekly windows.
Formula¶
| Component | Weight | Meaning |
|---|---|---|
reuse_rate |
60% | Fraction of recalls that found a reusable pattern (duplicate + adapt tiers) |
avg_eval_score |
40% | Average quality score of learned patterns |
The reuse component is weighted more heavily because reuse is the primary value signal — avoiding redundant work is why memory exists.
Score interpretation¶
| Score | Label | Meaning | Typical scenario |
|---|---|---|---|
| 0-2 | Cold start | Memory is barely useful | Few patterns stored, most recalls return "fresh" |
| 2-4 | Early value | Some patterns are being reused | ~20-30% recall reuse rate, moderate eval scores |
| 4-6 | Productive | Memory is delivering clear value | ~40-60% reuse rate, good eval scores |
| 6-8 | High efficiency | Strong pattern library | >60% reuse rate, consistently high eval scores |
| 8-10 | Optimal | Near-complete coverage | >80% reuse rate, excellent code quality |
Interpreting the components¶
Reuse rate¶
- < 20%: Memory has poor coverage of the task domain. Need more patterns.
- 20-50%: Growing coverage. Normal for the first weeks of use.
- 50-80%: Strong coverage. Most tasks have relevant prior experience.
- > 80%: Excellent. The agent rarely encounters truly novel tasks.
Average eval score¶
- < 5.0: Stored patterns have quality issues. Review evaluation criteria.
- 5.0-7.0: Acceptable quality. Typical for automated scoring.
- 7.0-9.0: High quality patterns. Agents produce consistently good code.
- > 9.0: Exceptional. May indicate eval scoring is too lenient.
Improving your ROI score¶
| Problem | Score symptom | Action |
|---|---|---|
| Low reuse rate | ROI < 3 despite good eval scores | Learn more patterns; broaden task coverage |
| Low eval scores | ROI < 5 despite decent reuse | Review code quality; tighten eval prompts |
| High variance in evals | Inconsistent scores | Increase num_evals (3-5) for more stable median |
| Stale patterns | Declining ROI over time | Run engramia aging regularly; patterns decay 2%/week |
| Wrong domain | ROI stuck at 0 | Ensure recalled tasks match the agent's actual workload |
Percentile metrics¶
The rollup also includes p50 (median) and p90 eval scores:
- p50 — half of learned patterns score above this. A reliable "typical quality" indicator.
- p90 — top 10% quality level. Useful for identifying your best patterns.
A large gap between p50 and p90 suggests inconsistent code quality — focus on raising the floor rather than the ceiling.