Runbook: Disk Full¶
Symptoms¶
- API returns 500 errors or hangs
docker compose logs engramia-apishowsOSError: No space left on devicedf -hon the VM shows 100% usage on/
Diagnostics¶
# Check disk usage on VM
ssh root@engramia-staging 'df -h'
# Find the largest consumers
ssh root@engramia-staging 'du -sh /var/lib/docker/volumes/* | sort -rh | head -20'
# Check Docker volume sizes specifically
ssh root@engramia-staging 'docker system df -v'
# Check PostgreSQL data volume
ssh root@engramia-staging 'du -sh /var/lib/docker/volumes/engramia_pgdata'
# Check log sizes
ssh root@engramia-staging 'journalctl --disk-usage'
Resolution¶
Step 1 — Free Docker overhead (safe, fast)¶
ssh root@engramia-staging 'docker system prune -f'
# Removes stopped containers, dangling images, unused networks
# Does NOT remove volumes
Step 2 — Truncate old container logs¶
ssh root@engramia-staging '
for log in $(docker inspect --format="{{.LogPath}}" $(docker ps -q)); do
truncate -s 0 "$log"
done
'
Step 3 — Rotate journal logs¶
Step 4 — If pgdata is the problem¶
# Connect to PostgreSQL and VACUUM
ssh root@engramia-staging \
'docker compose -f /opt/engramia/docker-compose.prod.yml exec pgvector \
psql -U engramia -c "VACUUM FULL memory_data; VACUUM FULL memory_embeddings;"'
# Check table bloat before/after
ssh root@engramia-staging \
'docker compose -f /opt/engramia/docker-compose.prod.yml exec pgvector \
psql -U engramia -c "\dt+"'
Step 5 — Emergency: remove old Docker images¶
# List images sorted by size
ssh root@engramia-staging 'docker images --format "table {{.Repository}}\t{{.Tag}}\t{{.Size}}" | sort -k3 -rh'
# Remove old Engramia images (keep current IMAGE_TAG)
ssh root@engramia-staging 'docker image prune -a --filter "until=72h" -f'
Prevention¶
- Configure log rotation in
/etc/docker/daemon.json: - Run
VACUUM ANALYZEweekly via cron - Monitor disk via Prometheus
node_filesystem_free_bytesalert at < 20%
Escalation¶
If disk is full and none of the above frees enough space, scale the Hetzner volume or upgrade to CX33.