Runbook: Rate Limit Tuning¶
Current Defaults¶
| Env var | Default | Applies to |
|---|---|---|
ENGRAMIA_RATE_LIMIT_DEFAULT |
60 req/min | All endpoints |
ENGRAMIA_RATE_LIMIT_EXPENSIVE |
10 req/min | /evaluate, /compose, /evolve |
Symptoms Indicating Limit Too Low¶
- Legitimate clients receive
429 Too Many Requests - Audit logs show
RATE_LIMITEDevents for known good IPs - Users report intermittent failures during batch operations
Symptoms Indicating Limit Too High¶
- LLM API costs spike unexpectedly
- VM CPU/memory saturated during eval bursts
Diagnosing Rate Limit Events¶
# Find rate-limited IPs and paths in logs
ssh root@engramia-staging \
'docker compose -f /opt/engramia/docker-compose.prod.yml logs engramia-api \
--since 1h | grep "Rate limit exceeded"'
# Check audit table (if DB auth mode)
ssh root@engramia-staging \
'docker compose -f /opt/engramia/docker-compose.prod.yml exec pgvector \
psql -U engramia -c "
SELECT ip, path, count(*), max(created_at)
FROM audit_log
WHERE event = '"'"'rate_limited'"'"'
AND created_at > NOW() - INTERVAL '"'"'1 hour'"'"'
GROUP BY ip, path
ORDER BY count DESC
LIMIT 20;"'
Adjusting Limits¶
Temporary (no restart)¶
The rate limiter reads env vars at startup only. A restart is required.
Permanent¶
ssh root@engramia-staging '
cd /opt/engramia
# Increase default limit to 120/min, keep expensive at 10/min
sed -i "s/^ENGRAMIA_RATE_LIMIT_DEFAULT=.*/ENGRAMIA_RATE_LIMIT_DEFAULT=120/" .env
grep -q ENGRAMIA_RATE_LIMIT_DEFAULT .env || echo "ENGRAMIA_RATE_LIMIT_DEFAULT=120" >> .env
docker compose -f docker-compose.prod.yml restart engramia-api
'
Per-client exemption (workaround)¶
The current rate limiter is per-IP. For trusted batch clients, consider:
1. Running them via a dedicated IP with a higher limit, or
2. Moving to the async job API (Prefer: respond-async header) to avoid synchronous rate limiting
Resetting Rate Limit State¶
The in-memory counter resets on API restart:
ssh root@engramia-staging \
'docker compose -f /opt/engramia/docker-compose.prod.yml restart engramia-api'
Recommendations¶
- Keep
ENGRAMIA_RATE_LIMIT_EXPENSIVE≤ 20 to control LLM costs - Alert when 429 rate exceeds 5% of total requests (Prometheus)
- For multi-instance deployments, replace the in-memory limiter with Redis