Google Gemini — BYOK Setup¶
Gemini is the cheapest of Engramia's first-party providers and offers
both LLM (gemini-2.5-flash / gemini-2.5-pro) and embeddings
(gemini-embedding-001) under one key. Default LLM model is
gemini-2.5-flash for cost reasons.
1. Create an API key in Google AI Studio¶
- Sign in at aistudio.google.com.
- Click Get API key → Create API key.
- Pick the Google Cloud project you want billing to land on
(Gemini API requires a billing-enabled project for paid models;
gemini-2.5-flashhas a generous free tier). - Copy the key. Gemini keys begin with
AIza.
Note: This guide is for the Gemini Developer API (AI Studio). Vertex AI Gemini uses a different auth flow (service account JSON + ADC) that Engramia does not currently support — vote for it on the roadmap if you need it.
2. Set up billing limits¶
- Cloud Console → Billing → Budgets & alerts.
- Create a monthly budget for the project.
- Set an alert at 50 %, 90 %, and 100 % of budget.
Google does not hard-stop API calls when a budget is exceeded — the budget is informational. To enforce a real cap, disable the API on the project temporarily; Engramia will see a 403 and fall back to demo mode.
3. Add the key in Engramia¶
In the dashboard:
- Settings → LLM Providers → Add provider.
- Provider: Google Gemini.
- Paste the key (starts with
AIza). - Default model: leave blank for
gemini-2.5-flash, or set: gemini-2.5-flash— default, cheapest, 1M-token contextgemini-2.5-pro— premium, best for complex reasoninggemini-flash-lite— even cheaper for high-volumeeval- Click Validate & save. Engramia pings the Gemini SDK
(
models.list) with a 5-second timeout.
REST equivalent:
curl -X POST https://api.engramia.dev/v1/credentials \
-H "Authorization: Bearer engramia-prod-..." \
-H "Content-Type: application/json" \
-d '{
"provider": "gemini",
"purpose": "both",
"api_key": "AIza...",
"default_model": "gemini-2.5-flash",
"default_embed_model": "gemini-embedding-001"
}'
4. Embedding dimensionality note¶
The default gemini-embedding-001 model produces 3072-dim vectors
— much larger than OpenAI's text-embedding-3-small (1536-dim).
Engramia's pgvector schema fixes the dimension at HNSW DDL time, so
switching from another embedding model to Gemini requires reindexing
the embedding column. See embedding-reindex.md
for the procedure (Pro+ tier).
If you are starting fresh, Gemini embeddings are perfectly fine. If you already have patterns indexed with OpenAI 1536-dim embeddings, either:
- Stick with an OpenAI embedding credential (set
purpose=embeddingon the OpenAI row,purpose=llmon the Gemini row), or - Run the reindex and accept the disk cost (~2× storage for embeddings).
5. Per-role routing (Business tier)¶
curl -X PATCH https://api.engramia.dev/v1/credentials/{id} \
-H "Authorization: Bearer engramia-prod-..." \
-H "Content-Type: application/json" \
-d '{
"role_models": {
"eval": "gemini-flash-lite",
"evolve": "gemini-2.5-pro",
"default": "gemini-2.5-flash"
}
}'
Troubleshooting¶
400 category=auth_failed
: Either the key is wrong or the Gemini API isn't enabled on the
underlying Google Cloud project. Check
Cloud Console → APIs & Services → Library → Generative Language API
→ Enable.
400 category=unreachable
: AI Studio occasionally rate-limits the models.list ping during
region failovers. Wait a minute and retry.
Embeddings dimensions mismatch (recall returns 500) : You changed the embedding provider without reindexing. See the warning in step 4 and run embedding-reindex.md.
MAX_TOKENS errors in eval
: Gemini 2.5 Flash has an 8k output token default in the SDK; raise
it explicitly via default_model settings if your eval prompts
routinely exceed that. Engramia caps at 4096 by default, which is
enough for a structured eval JSON response.