Google ships Gemini in two architecturally distinct ways, and Halton Meter has an adapter for each:
adapters/gemini.py— ownsgenerativelanguage.googleapis.com. The public Google AI Studio / Generative Language API. API-key auth, REST endpoints under/v1*/models/<model>:<verb>.adapters/gemini_code_assist.py— ownscloudcode-pa.googleapis.com. The Gemini Code Assist surface used by the JetBrains and VS Code Code Assist plugins. OAuth tokens, internal paths under/v1internal:<verb>.
Both adapters share name="gemini" so reports roll them up under one
provider. The mode column distinguishes them at row level.
Public API — costed paths
generativelanguage.googleapis.com:
| Path | Purpose |
|---|---|
/v1*/models/<model>:generateContent | Standard generation |
/v1*/models/<model>:streamGenerateContent | Streaming variant |
/v1*/models/<model>:embedContent | Single embedding |
/v1*/models/<model>:batchEmbedContents | Batch embeddings |
/v1*/models/<model>:countTokens | Token-count helper (zero-cost row, useful for visibility) |
/v1* covers /v1, /v1beta, /v1beta1, /v1alpha — the API
version path is part of the URL and matched as a prefix.
Code Assist — costed paths
cloudcode-pa.googleapis.com:
| Path | Purpose |
|---|---|
/v1internal:generateContent | Standard Code Assist completion |
/v1internal:streamGenerateContent | Streaming Code Assist completion |
This is a differentiator. LiteLLM, Helicone, Langfuse, and OpenLLMetry don’t capture Gemini Code Assist — it doesn’t go through their SDK shims. Halton Meter captures it because it intercepts at the network layer.
Captured fields
For both adapters:
provider = "gemini"model— from the response’susageMetadata.modelVersioninput_tokens—usageMetadata.promptTokenCountoutput_tokens—usageMetadata.candidatesTokenCountthinking_tokens—usageMetadata.thoughtsTokenCount, when extended thinking is enabledcost_usd_minor_units— against the active rate card
cache_read_tokens and cache_write_tokens are zero — Google’s
prompt-cache pricing model is different from Anthropic’s and isn’t
yet split out.
Tools that route through these adapters
| Tool | Adapter | Auth |
|---|---|---|
Google AI Studio Python SDK (google-genai) | gemini.py | API key |
curl against generativelanguage.googleapis.com | gemini.py | API key |
| Gemini Code Assist (JetBrains, VS Code) | gemini_code_assist.py | OAuth |
Verifying public-API capture
$ halton-meter run -- curl -sS \
"https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GEMINI_API_KEY" \
-d '{"contents":[{"parts":[{"text":"hi"}]}]}'
$ halton-meter report --since 5m --by model For Code Assist capture, run Code Assist normally after init --apps
— the OAuth handshake against cloudcode-pa.googleapis.com is
captured by the Code Assist adapter without further setup.
Error classification
Gemini reports errors as gRPC status codes mapped onto HTTP. Both adapters classify them into the seven canonical buckets — see Error classification. Shipped in v0.3.0.
| gRPC status | HTTP | error_class | retryable |
|---|---|---|---|
INVALID_ARGUMENT | 400 | bad_request | false |
UNAUTHENTICATED | 401 | auth | false |
PERMISSION_DENIED | 403 | auth | false |
NOT_FOUND | 404 | bad_request | false |
FAILED_PRECONDITION | 400 | bad_request | false |
RESOURCE_EXHAUSTED | 429 | rate_limit | true |
DEADLINE_EXCEEDED | 504 | timeout | true |
ABORTED | 409 | server_error | true |
INTERNAL | 500 | server_error | true |
UNAVAILABLE | 503 | server_error | true |
Host matching
generativelanguage.googleapis.com and cloudcode-pa.googleapis.com
are matched exactly. Other Google subdomains (aiplatform.googleapis.com,
cloudaicompanion.googleapis.com) are not in scope today.