Gemini · Halton Meter docs

Two adapters cover the public Google AI API and the Gemini Code Assist OAuth surface — covering both API-key Gemini and the IDE OAuth flow.

Google ships Gemini in two architecturally distinct ways, and Halton Meter has an adapter for each:

adapters/gemini.py — owns generativelanguage.googleapis.com. The public Google AI Studio / Generative Language API. API-key auth, REST endpoints under /v1*/models/<model>:<verb>.
adapters/gemini_code_assist.py — owns cloudcode-pa.googleapis.com. The Gemini Code Assist surface used by the JetBrains and VS Code Code Assist plugins. OAuth tokens, internal paths under /v1internal:<verb>.

Both adapters share name="gemini" so reports roll them up under one provider. The mode column distinguishes them at row level.

Public API — costed paths

generativelanguage.googleapis.com:

Path	Purpose
`/v1*/models/<model>:generateContent`	Standard generation
`/v1*/models/<model>:streamGenerateContent`	Streaming variant
`/v1*/models/<model>:embedContent`	Single embedding
`/v1*/models/<model>:batchEmbedContents`	Batch embeddings
`/v1*/models/<model>:countTokens`	Token-count helper (zero-cost row, useful for visibility)

/v1* covers /v1, /v1beta, /v1beta1, /v1alpha — the API version path is part of the URL and matched as a prefix.

Code Assist — costed paths

cloudcode-pa.googleapis.com:

Path	Purpose
`/v1internal:generateContent`	Standard Code Assist completion
`/v1internal:streamGenerateContent`	Streaming Code Assist completion

This is a differentiator. LiteLLM, Helicone, Langfuse, and OpenLLMetry don’t capture Gemini Code Assist — it doesn’t go through their SDK shims. Halton Meter captures it because it intercepts at the network layer.

Captured fields

For both adapters:

provider = "gemini"
model — from the response’s usageMetadata.modelVersion
input_tokens — usageMetadata.promptTokenCount
output_tokens — usageMetadata.candidatesTokenCount
thinking_tokens — usageMetadata.thoughtsTokenCount, when extended thinking is enabled
cost_usd_minor_units — against the active rate card

cache_read_tokens and cache_write_tokens are zero — Google’s prompt-cache pricing model is different from Anthropic’s and isn’t yet split out.

Tools that route through these adapters

Tool	Adapter	Auth
Google AI Studio Python SDK (`google-genai`)	`gemini.py`	API key
`curl` against `generativelanguage.googleapis.com`	`gemini.py`	API key
Gemini Code Assist (JetBrains, VS Code)	`gemini_code_assist.py`	OAuth

Verifying public-API capture

~ — verify Gemini capture

$ halton-meter run -- curl -sS \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent?key=$GEMINI_API_KEY" \
  -d '{"contents":[{"parts":[{"text":"hi"}]}]}'

$ halton-meter report --since 5m --by model

For Code Assist capture, run Code Assist normally after init --apps — the OAuth handshake against cloudcode-pa.googleapis.com is captured by the Code Assist adapter without further setup.

Error classification

Gemini reports errors as gRPC status codes mapped onto HTTP. Both adapters classify them into the seven canonical buckets — see Error classification. Shipped in v0.3.0.

gRPC status	HTTP	`error_class`	`retryable`
`INVALID_ARGUMENT`	400	`bad_request`	false
`UNAUTHENTICATED`	401	`auth`	false
`PERMISSION_DENIED`	403	`auth`	false
`NOT_FOUND`	404	`bad_request`	false
`FAILED_PRECONDITION`	400	`bad_request`	false
`RESOURCE_EXHAUSTED`	429	`rate_limit`	true
`DEADLINE_EXCEEDED`	504	`timeout`	true
`ABORTED`	409	`server_error`	true
`INTERNAL`	500	`server_error`	true
`UNAVAILABLE`	503	`server_error`	true

Host matching

generativelanguage.googleapis.com and cloudcode-pa.googleapis.com are matched exactly. Other Google subdomains (aiplatform.googleapis.com, cloudaicompanion.googleapis.com) are not in scope today.