Are the prices accurate?

The prices are editable estimates as of June 2026 and provided for convenience. Provider pricing changes often — edit any rate inline, and verify against the official Anthropic, OpenAI, and Google pricing pages.

Does prompt caching really lower cost?

Yes. On a cache hit, the cached input is billed at roughly 10% of the base input rate, so repetitive context (system prompts, project files) costs far less. The cache-hit slider models this.

LLM API Cost Calculator

Estimate what Claude, GPT, and Gemini will actually cost for your workload. Enter your average tokens per request and daily volume — the table updates live with per-request, per-day, and monthly cost for every model. Every price is editable.

Avg input tokens / request

Avg output tokens / request

Requests / day

Prompt-cache hit (input) — 0%

Model	Input $/M	Output $/M	$ / request	$ / day	$ / month

Prices are editable estimates as of June 2026 (USD per 1M tokens) — edit any cell to match what you actually pay. Always confirm against the official Anthropic, OpenAI, and Google pricing pages. The cheapest monthly total is highlighted.

How the math works

The formula is simple — the hard part is knowing your real token counts:

cost per request = (input_tokens × input_price + output_tokens × output_price) ÷ 1,000,000
cost per month   = cost per request × requests_per_day × 30

Two things people consistently underestimate: output tokens are far more expensive than input (often 4–5×), and agentic tools resend context every turn, so a single "task" can be dozens of requests. If your real usage is higher than expected, see reducing Cursor token usage.

Why prompt caching matters

If you send the same system prompt or project context repeatedly, prompt caching bills the cached portion at roughly 10% of the input rate on a cache hit. Drag the cache-hit slider up to see the effect — for repetitive workloads it's often the single biggest lever on your bill, bigger than switching models.

See your actual per-request cost — one endpoint, every model

This tool estimates list prices. TokenProvider logs the real tokens and cost of every call across Claude, GPT, and Gemini from a single key, so you can stop guessing.

FAQ

How is API cost calculated?

Cost per request = (input tokens × input price + output tokens × output price) ÷ 1,000,000, then × requests/day × 30 for a month. Caching lowers the input portion on hits.

Are these prices current?

They're editable estimates as of June 2026. Provider pricing moves often — edit any rate inline and verify against the official pricing pages linked above.

How do I find my real token counts?

Check a per-request usage log. Most workloads have far higher input tokens than people guess because of system prompts and resent context.