What is LLM cost attribution?

LLM cost attribution ties every LLM API request — and its exact token cost — back to the project that spent it. Halton Meter observes outbound traffic at the proxy, prices each request against published provider rates, and attributes it to a project so you get a per-project LLM cost breakdown instead of one monthly total.

Does Halton Meter require code changes?

No. It runs as a transparent local proxy and captures outbound LLM API calls at the network layer. It never wraps your SDKs, so there are no code changes, no API keys to hand over, and no library to import. One command installs it: uvx halton-meter init --apps.

Which LLM providers does it capture?

Anthropic (Claude), OpenAI, Google Gemini, and xAI / Grok — including OAuth surfaces like ChatGPT and Gemini Code Assist that other tools miss. That unified, multi-provider view is the LLM observability the provider dashboards do not give you.

How does token cost tracking work?

Halton Meter reads the token counts on every captured request and response and prices them against published per-token provider rates, computing the cost to the penny. It is token cost tracking against the real rate card — not an estimate or a sampled average.

Is my data sent anywhere?

No. The daemon is local-first: it stores everything in a SQLite file on your disk (~/.halton-meter/db.sqlite) and its HTTP API binds 127.0.0.1 only. There is no analytics, no telemetry, and no version-check ping. Nothing leaves your machine unless you opt into the hosted cloud tier at cloud.haltonmeter.com.

What does the hosted cloud tier add?

The optional cloud tier adds team rollups, reconciliation against provider invoices, branded PDF cost reports, and a state-change audit log. It pairs with the local daemon over an opt-in encrypted sync. Pricing (USD): Solo $16/mo annual, Team $99/mo annual (10 seats included). The local daemon stays free.

Halton Meter Cloud

Know exactly where
your AI spend goes.

One command installs a local proxy that intercepts every request to Claude, OpenAI, Gemini, and Grok, including OAuth surfaces like ChatGPT and Gemini Code Assist. Every request attributed to a project. Every token costed to the penny. Stored locally. No cloud dependency. No code changes. No SDK swaps.

Install free See how it works uvx halton-meter

~/work · halton-meter report

zsh · 132×40

halton-meter › cost report

v0.5.0 · May 1 → May 7

— › summary —

Requests 3,861

Coverage 20 projects · 7 models

Tokens 801,370 in · 1,991,887 out

Total cost $250.03

— › by project —

project requests cost share avg p95 7d trend

clara-backend 1,568 $107.20 9.4s 32.9s ▄▅▅▇█▆▇

misc 1,005 $61.14 8.3s 27.1s ▅▅▆▅▇▇▆

unattributed 525 $37.37 11.1s 35.4s ▃▄▅▃▄▅▅

neon-api 409 $31.40 11.3s 37.2s ▅▅▄▆▅▇▆

fieldwork 312 $11.42 6.8s 18.5s ▂▃▃▄▃▅▄

— › by model —

model requests in out cost avg p95

claude-opus-4-7 3,127 619,557 1,722,829 $235.10 9.9s 33.0s

claude-sonnet-4-6 267 47,223 130,289 $9.97 10.5s 39.5s

claude-haiku-4-5 453 28,267 137,960 $4.93 5.9s 17.3s

gemini-2.5-flash 6 106,323 809 $0.03 4.7s 7.6s

gpt-4o 1 0 0 $0.00 955ms 955ms

02 The problem

LLM spend is real. Price attribution isn't.

Providers give you one monthly total. No project breakdown. No per-developer view. No way to reconcile five dashboards into one number your finance team will sign off on. That gap is LLM price attribution: every token priced to the penny and tied to the project that spent it.

$8,400

The invoice arrived

You can't say which project caused it. Which developer. Or whether it could've been $4,000. There's no drill-down. Just a total.

4 providers

No single source of truth

Claude, OpenAI, Gemini, Grok: each has its own dashboard. ChatGPT and Gemini Code Assist sit on OAuth surfaces nothing else captures. No unified view. No per-project breakdown. No shared number engineering and finance both trust.

1 invoice

Client asks for a breakdown

How does the AI line item break down by project? You can't show them. So they don't quite trust it, and next time they push back on the budget.

03 How it works

A local proxy. One SQLite file. All four providers.

Halton Meter intercepts HTTPS traffic to api.anthropic.com, api.openai.com, generativelanguage.googleapis.com, and api.x.ai via mitmproxy, plus OAuth surfaces atchatgpt.com and cloudcode-pa.googleapis.com that nothing else captures. Zero changes to your workflow. Every request is tagged to a project, costed from live provider pricing, and written to a local SQLite file. The dashboard reads over loopback. Nothing leaves your machine.

Install

One command. Proxy + cert + supervisor.

halton-meter init generates the mitmproxy CA cert, trusts it in the system keychain, and registers the launchd/systemd supervisor. halton-meter start brings the daemon up. A watchdog probes the MITM listener and auto-restarts on failure (capped 3/hour); if the daemon stays down, edge passthrough means your LLM clients keep working: requests fall through directly. Run halton-meter doctor for a full diagnostic.

· Python 3.11+ · macOS · Linux · Windows (beta) · No telemetry

$ uvx halton-meter

Prefer pipx? pipx install halton-meter still works.

Tag

Smart Attribution. Zero config.

The daemon walks a resolver chain and picks the first layer that answers. Git repo basename becomes the project tag. In containers you get docker:<hostname> or k8s:<hostname>. Sandboxed Mac apps tag as their bundle ID (mac:ai.perplexity.mac) instead of collapsing to a generic Data folder.

0 SDK swaps · 0 base URL changes · 0 instrumentation

Report

Run the report. See the number.

halton-meter report prints a per-project, per-model cost breakdown from your local SQLite file. Open the dashboard at localhost:3000 for charts and drilldowns. Every figure is computed locally. Not an estimate, not a sampling.

halton-meter report --since 7d --project clara-backend

04 Pricing accuracy

How pricing accuracy works.

Every cost figure Halton Meter shows you was computed locally, from a dated pricing matrix that shipped inside the version you have installed. The cost path never touches the network. The math is reproducible offline today, next quarter, and two years from now, even if a provider republishes their pricing page tomorrow.

That guarantee rests on four things.

Bundled, dated rates.

Each release of halton-meter carries a pricing matrix sourced from each provider's public pricing page: Anthropic, OpenAI, Google Gemini, and xAI, including Gemini's >200k tiered surcharge. The bundle is dated. You can read it.
Freshness without surveillance.

halton-meter doctor quietly checks whether your bundle is behind the latest published manifest and tells you to upgrade if it is. The check is fail-open and skippable on airgapped installs. It never alters a cost number.
Negotiated rates, first-class.

Customers with their own contracts can override any rate. Overrides survive upgrades and are labelled distinctly in every report.
Provenance on every row.

Every logged request stores the exact rate source that priced it: bundled-2026-05-01 or override. A CSV pulled six months from now is self-describing. No "trust us" maths.

See the full token cost tracking methodology

05 Product

Terminal report for the developer. Dashboard for everyone else.

halton-meter report prints a per-project, per-model breakdown in seconds. The local dashboard at localhost:3000 turns the same SQLite data into charts your CFO and clients can read. One data source. Two views. Nothing leaves the machine.

Overview

All projects, last 30 days

Spend this month

$879.60

Requests

14,790

Active projects

5 total

Spend over time

Stacked by provider · last 30 days

anthropic

openai

google

grok

Projects

Project	Status	Requests	Spend
clara-backend clara-backend	Active	6,720	$412.80
misc misc	Active	3,840	$235.10
neon-api neon-api	Active	2,100	$143.60
fieldwork fieldwork	Archived	890	$48.20
unattributed unattributed	Active	1,240	$39.90

ATTRIBUTION

Per-project, per-client, per-developer.

Smart Attribution v0.3. Git repo, container, sandbox bundle ID: the daemon resolves the project tag automatically. HALTON_PROJECT overrides when you want to force one. Stop guessing what to charge.

clara-backend

1,568 requests · 343.8k in / 854.5k out

$107.20

misc

1,005 requests · 196.3k in / 488.0k out

$61.14

unattributed

525 requests · 119.4k in / 296.8k out

$37.37

OPTIMISATION

CLOUD · LIVE NOW

Recommendations that quote a number.

Halton Meter Cloud analyses your last 30 days of local data and tells you exactly where you're overspending, and what swapping models would save. Sits alongside reconciliation against provider billing and a full audit log; the daemon ships the raw data, cloud does the rollup.

47 Opus calls could have been Sonnet

halton-meter · ≤6k input tokens, no thinking required

−$31.20

per day

Enable prompt caching on system prompt

staffhub · 12.4k tokens repeated across 1,842 requests

−$18.40

per day

Route validation calls to Haiku

haltonlabs · 318 calls sub-1k tokens, no reasoning

−$8.80

per day

REPORTS

CLOUD · LIVE NOW

Client-ready cost reports.

Raw cost data shipping today: pull CSV or JSON from the local HTTP API at 127.0.0.1 (loopback). Pipe it anywhere.
Halton Meter Cloud, live at cloud.haltonmeter.com: branded PDF invoices, executive summaries, and per-developer cost reports, all on top of an audit log of every captured request. Attach it. Sign it. Send it.

Cost Report · April 2026

halton-meter

internal · 28 days · generated by halton-meter v0.5.0

Total

$120.01

Requests

2,378

Tokens

1.25M

COVERAGE

Anthropic, OpenAI, Gemini, xAI in production.

Plus OAuth surfaces (ChatGPT and Gemini Code Assist) that nothing else captures. 6 adapters across 4 providers. Each adapter is a single file under daemon/halton_meter/adapters/.

Anthropic

Production · v0.5.0

OpenAI

Beta adapter

Google

Beta adapter

xAI / Grok

Beta adapter

OAUTH

ChatGPT

OAuth · chatgpt.com

OAUTH

Code Assist

OAuth · cloudcode-pa

Adapters live in daemon/halton_meter/adapters/

06 Cloud

When one machine isn't enough.

The daemon captures and attributes cost locally. Cloud is the layer that rolls multiple daemons into one view: team-wide LLM cost attribution, reconciled against provider invoices, with branded PDF reports for clients. It is optional. The daemon keeps working whether you pair it or not.

Team rollups: every developer's daemon in one hosted workspace
Provider reconciliation: captured spend matched against actual invoices
Branded PDF reports: client-ready, generated server-side on demand

cloud.haltonmeter.com

07 Compare

Every alternative requires a code change or a cloud dependency.

6 adapters across 4 providers. 0 SDK changes. 1 process on loopback. LiteLLM and Helicone need a base-URL swap in every codebase. Langfuse and OpenLLMetry need SDK instrumentation in every repo. Helicone fails closed: a service outage blocks your requests. Halton Meter intercepts at the network layer.

Capability	Halton Meter	LiteLLM	Langfuse	Helicone	OpenLLMetry
Local-first (no infrastructure)	Yes	No	No	No	Self-host option
Multi-provider observability	Yes	Yes	Yes	Yes	Yes
Project-level attribution	Yes	Tag-based	Yes	Yes	Trace-based
Zero code changes	Yes	No	No	No	No
Fail-open: never blocks requests	Yes	No	Yes	No	Yes
Client-ready cost reports	Yes	No	No	No	No
Intercepts OAuth surfaces	Yes	No	No	No	No
Stores full request + response bodies (redacted)	Yes	No	Full prompt + output, no built-in redaction	Full bodies stored by default; opt-out via header	Configurable; off by default in some SDKs

NOTE Where Halton Meter doesn't yet have something, we say "planned" rather than fake-ticking. Honesty is the product.

08 For teams

Live now · cloud.haltonmeter.com

Audit, reconciliation, and team rollups.

The local daemon captures every request on one developer's machine. The hosted cloud rolls those captures up across the team, and adds the two things the daemon cannot do alone: a full audit log of every request and policy event, and reconciliation of captured spend against the invoices your providers actually send. Both surfaces share one cost model; the cloud is the team-shaped roof on the local daemon.

TEAM

Cross-machine team visibility

Roll up spend across every developer's local daemon into one hosted dashboard.

AUDIT

Every request, logged

Every captured request and config change, queryable and exportable when a client asks.

RECONCILIATION

Match captured spend to provider billing

The cloud reconciles your captured spend against the actual invoices each provider sends.

ANALYSIS

Continuous optimisation

Automated recommendations from real usage patterns, not synthetic benchmarks.

TRENDING

Historical trending

Three-month, six-month, year-over-year cost analysis with project drilldown.

REPORTS

Custom client reports

Branded PDF reports on demand, with cost breakdown and methodology.

Runs on your machine. Stays on your machine.

Halton Meter runs as a local binary on your machine. The captured-request database (~/.halton-meter/db.sqlite) is a file on your disk, never replicated, never uploaded. The HTTP API binds 127.0.0.1 and refuses non-loopback connections. There is no analytics endpoint, no crash reporter, no version-check ping. The only outbound traffic the daemon makes is the LLM provider call you were already going to make. Verify the entire surface with lsof -nP -iTCP -sTCP:LISTEN and Little Snitch. A small dashboard ships alongside at localhost:3000: open source under Apache 2.0, free, an accessory rather than the product.

v0.5.0

on PyPI now

17k

installs · all-time

SQLite

local · no cloud

outbound endpoints

Read the docs Inspect the network surface

From the workshop

Halton Meter is built by Halton Labs, a one-person studio that builds software for regulated industries and uses LLMs heavily across every project. We built this because we needed to show clients exactly what their AI work cost. The daemon is what we run; a small dashboard ships with it. The cloud tier (team aggregation, hosted dashboards, reconciliation against provider invoices) is live at cloud.haltonmeter.com.

We're sharing it because the problem is universal. If you're spending real money on AI and can't produce a per-project breakdown, Halton Meter is the tool. Install it in under a minute. If it doesn't work for you, the issue tracker is open.

vk · halton labs MAY '26 · v0.5.0

10 Questions

Common questions

What is LLM cost attribution?: LLM cost attribution ties every LLM API request — and its exact token cost — back to the project that spent it. Halton Meter observes outbound traffic at the proxy, prices each request against published provider rates, and attributes it to a project so you get a per-project LLM cost breakdown instead of one monthly total.
Does Halton Meter require code changes?: No. It runs as a transparent local proxy and captures outbound LLM API calls at the network layer. It never wraps your SDKs, so there are no code changes, no API keys to hand over, and no library to import. One command installs it: uvx halton-meter init --apps.
Which LLM providers does it capture?: Anthropic (Claude), OpenAI, Google Gemini, and xAI / Grok — including OAuth surfaces like ChatGPT and Gemini Code Assist that other tools miss. That unified, multi-provider view is the LLM observability the provider dashboards do not give you.
How does token cost tracking work?: Halton Meter reads the token counts on every captured request and response and prices them against published per-token provider rates, computing the cost to the penny. It is token cost tracking against the real rate card — not an estimate or a sampled average.
Is my data sent anywhere?: No. The daemon is local-first: it stores everything in a SQLite file on your disk (~/.halton-meter/db.sqlite) and its HTTP API binds 127.0.0.1 only. There is no analytics, no telemetry, and no version-check ping. Nothing leaves your machine unless you opt into the hosted cloud tier at cloud.haltonmeter.com.
What does the hosted cloud tier add?: The optional cloud tier adds team rollups, reconciliation against provider invoices, branded PDF cost reports, and a state-change audit log. It pairs with the local daemon over an opt-in encrypted sync. Pricing (USD): Solo $16/mo annual, Team $99/mo annual (10 seats included). The local daemon stays free.

Know exactly whereyour AI spend goes.

LLM spend is real. Price attribution isn't.

A local proxy. One SQLite file. All four providers.

How pricing accuracy works.

Terminal report for the developer. Dashboard for everyone else.

Per-project, per-client, per-developer.

Recommendations that quote a number.

Client-ready cost reports.

Anthropic, OpenAI, Gemini, xAI in production.

When one machine isn't enough.

Every alternative requires a code change or a cloud dependency.

Audit, reconciliation, and team rollups.

Runs on your machine. Stays on your machine.

From the workshop

Common questions

Know exactly where
your AI spend goes.