The daemon walks a resolver chain and picks the first layer that answers. Git repo basename becomes the project tag. In containers you get docker:<hostname> or k8s:<hostname>. Sandboxed Mac apps tag as their bundle ID (mac:ai.perplexity.mac) instead of collapsing to a generic Data folder.
Know exactly where
your AI spend goes.
One command installs a local proxy that intercepts every request to Claude, OpenAI, Gemini, and Grok, including OAuth surfaces like ChatGPT and Gemini Code Assist. Every request attributed to a project. Every token costed to the penny. Stored locally. No cloud dependency. No code changes. No SDK swaps.
LLM spend is real. Price attribution isn't.
Providers give you one monthly total. No project breakdown. No per-developer view. No way to reconcile five dashboards into one number your finance team will sign off on. That gap is LLM price attribution: every token priced to the penny and tied to the project that spent it.
You can't say which project caused it. Which developer. Or whether it could've been $4,000. There's no drill-down. Just a total.
Claude, OpenAI, Gemini, Grok: each has its own dashboard. ChatGPT and Gemini Code Assist sit on OAuth surfaces nothing else captures. No unified view. No per-project breakdown. No shared number engineering and finance both trust.
How does the AI line item break down by project? You can't show them. So they don't quite trust it, and next time they push back on the budget.
A local proxy. One SQLite file. All four providers.
Halton Meter intercepts HTTPS traffic to api.anthropic.com, api.openai.com, generativelanguage.googleapis.com, and api.x.ai via mitmproxy, plus OAuth surfaces atchatgpt.com and cloudcode-pa.googleapis.com that nothing else captures. Zero changes to your workflow. Every request is tagged to a project, costed from live provider pricing, and written to a local SQLite file. The dashboard reads over loopback. Nothing leaves your machine.
halton-meter init generates the mitmproxy CA cert, trusts it in the system keychain, and registers the launchd/systemd supervisor. halton-meter start brings the daemon up. A watchdog probes the MITM listener and auto-restarts on failure (capped 3/hour); if the daemon stays down, edge passthrough means your LLM clients keep working: requests fall through directly. Run halton-meter doctor for a full diagnostic.
uvx halton-meter Prefer pipx? pipx install halton-meter still works.
halton-meter report prints a per-project, per-model cost breakdown from your local SQLite file. Open the dashboard at localhost:3000 for charts and drilldowns. Every figure is computed locally. Not an estimate, not a sampling.
How pricing accuracy works.
Every cost figure Halton Meter shows you was computed locally, from a dated pricing matrix that shipped inside the version you have installed. The cost path never touches the network. The math is reproducible offline today, next quarter, and two years from now, even if a provider republishes their pricing page tomorrow.
That guarantee rests on four things.
-
Bundled, dated rates.
Each release of halton-meter carries a pricing matrix sourced from each provider's public pricing page: Anthropic, OpenAI, Google Gemini, and xAI, including Gemini's >200k tiered surcharge. The bundle is dated. You can read it.
-
Freshness without surveillance.
halton-meter doctor quietly checks whether your bundle is behind the latest published manifest and tells you to upgrade if it is. The check is fail-open and skippable on airgapped installs. It never alters a cost number.
-
Negotiated rates, first-class.
Customers with their own contracts can override any rate. Overrides survive upgrades and are labelled distinctly in every report.
-
Provenance on every row.
Every logged request stores the exact rate source that priced it: bundled-2026-05-01 or override. A CSV pulled six months from now is self-describing. No "trust us" maths.
Terminal report for the developer. Dashboard for everyone else.
halton-meter report prints a per-project, per-model breakdown in seconds.
The local dashboard at localhost:3000 turns the same SQLite data into charts your CFO and clients can read.
One data source. Two views. Nothing leaves the machine.
| Project | Status | Requests | Spend |
|---|---|---|---|
| clara-backend clara-backend | Active | 6,720 | $412.80 |
| misc misc | Active | 3,840 | $235.10 |
| neon-api neon-api | Active | 2,100 | $143.60 |
| fieldwork fieldwork | Archived | 890 | $48.20 |
| unattributed unattributed | Active | 1,240 | $39.90 |
Per-project, per-client, per-developer.
Smart Attribution v0.3. Git repo, container, sandbox bundle ID: the daemon resolves the project tag automatically. HALTON_PROJECT overrides when you want to force one. Stop guessing what to charge.
Recommendations that quote a number.
Halton Meter Cloud analyses your last 30 days of local data and tells you exactly where you're overspending, and what swapping models would save. Sits alongside reconciliation against provider billing and a full audit log; the daemon ships the raw data, cloud does the rollup.
Client-ready cost reports.
Raw cost data shipping today: pull CSV or JSON from the local HTTP API at 127.0.0.1 (loopback). Pipe it anywhere.
Halton Meter Cloud, live at cloud.haltonmeter.com: branded PDF invoices, executive summaries, and per-developer cost reports, all on top of an audit log of every captured request. Attach it. Sign it. Send it.
Anthropic, OpenAI, Gemini, xAI in production.
Plus OAuth surfaces (ChatGPT and Gemini Code Assist) that nothing else captures. 6 adapters across 4 providers. Each adapter is a single file under daemon/halton_meter/adapters/.
When one machine isn't enough.
The daemon captures and attributes cost locally. Cloud is the layer that rolls multiple daemons into one view: team-wide LLM cost attribution, reconciled against provider invoices, with branded PDF reports for clients. It is optional. The daemon keeps working whether you pair it or not.
- Team rollups: every developer's daemon in one hosted workspace
- Provider reconciliation: captured spend matched against actual invoices
- Branded PDF reports: client-ready, generated server-side on demand
Every alternative requires a code change or a cloud dependency.
6 adapters across 4 providers. 0 SDK changes. 1 process on loopback. LiteLLM and Helicone need a base-URL swap in every codebase. Langfuse and OpenLLMetry need SDK instrumentation in every repo. Helicone fails closed: a service outage blocks your requests. Halton Meter intercepts at the network layer.
| Capability | Halton Meter | LiteLLM | Langfuse | Helicone | OpenLLMetry |
|---|---|---|---|---|---|
| Local-first (no infrastructure) | Yes | No | No | No | Self-host option |
| Multi-provider observability | Yes | Yes | Yes | Yes | Yes |
| Project-level attribution | Yes | Tag-based | Yes | Yes | Trace-based |
| Zero code changes | Yes | No | No | No | No |
| Fail-open: never blocks requests | Yes | No | Yes | No | Yes |
| Client-ready cost reports | Yes | No | No | No | No |
| Intercepts OAuth surfaces | Yes | No | No | No | No |
| Stores full request + response bodies (redacted) | Yes | No | Full prompt + output, no built-in redaction | Full bodies stored by default; opt-out via header | Configurable; off by default in some SDKs |
NOTE Where Halton Meter doesn't yet have something, we say "planned" rather than fake-ticking. Honesty is the product.
Audit, reconciliation, and team rollups.
The local daemon captures every request on one developer's machine. The hosted cloud rolls those captures up across the team, and adds the two things the daemon cannot do alone: a full audit log of every request and policy event, and reconciliation of captured spend against the invoices your providers actually send. Both surfaces share one cost model; the cloud is the team-shaped roof on the local daemon.
Runs on your machine. Stays on your machine.
Halton Meter runs as a local binary on your machine. The captured-request database (~/.halton-meter/db.sqlite) is a file on your disk, never replicated, never uploaded. The HTTP API binds 127.0.0.1 and refuses non-loopback connections. There is no analytics endpoint, no crash reporter, no version-check ping. The only outbound traffic the daemon makes is the LLM provider call you were already going to make. Verify the entire surface with lsof -nP -iTCP -sTCP:LISTEN and Little Snitch. A small dashboard ships alongside at localhost:3000: open source under Apache 2.0, free, an accessory rather than the product.
From the workshop
Halton Meter is built by Halton Labs, a one-person studio that builds software for regulated industries and uses LLMs heavily across every project. We built this because we needed to show clients exactly what their AI work cost. The daemon is what we run; a small dashboard ships with it. The cloud tier (team aggregation, hosted dashboards, reconciliation against provider invoices) is live at cloud.haltonmeter.com.
We're sharing it because the problem is universal. If you're spending real money on AI and can't produce a per-project breakdown, Halton Meter is the tool. Install it in under a minute. If it doesn't work for you, the issue tracker is open.
Common questions
- What is LLM cost attribution?
- LLM cost attribution ties every LLM API request — and its exact token cost — back to the project that spent it. Halton Meter observes outbound traffic at the proxy, prices each request against published provider rates, and attributes it to a project so you get a per-project LLM cost breakdown instead of one monthly total.
- Does Halton Meter require code changes?
- No. It runs as a transparent local proxy and captures outbound LLM API calls at the network layer. It never wraps your SDKs, so there are no code changes, no API keys to hand over, and no library to import. One command installs it: uvx halton-meter init --apps.
- Which LLM providers does it capture?
- Anthropic (Claude), OpenAI, Google Gemini, and xAI / Grok — including OAuth surfaces like ChatGPT and Gemini Code Assist that other tools miss. That unified, multi-provider view is the LLM observability the provider dashboards do not give you.
- How does token cost tracking work?
- Halton Meter reads the token counts on every captured request and response and prices them against published per-token provider rates, computing the cost to the penny. It is token cost tracking against the real rate card — not an estimate or a sampled average.
- Is my data sent anywhere?
- No. The daemon is local-first: it stores everything in a SQLite file on your disk (~/.halton-meter/db.sqlite) and its HTTP API binds 127.0.0.1 only. There is no analytics, no telemetry, and no version-check ping. Nothing leaves your machine unless you opt into the hosted cloud tier at cloud.haltonmeter.com.
- What does the hosted cloud tier add?
- The optional cloud tier adds team rollups, reconciliation against provider invoices, branded PDF cost reports, and a state-change audit log. It pairs with the local daemon over an opt-in encrypted sync. Pricing (USD): Solo $16/mo annual, Team $99/mo annual (10 seats included). The local daemon stays free.