What Is an AI Token?

A token is the smallest unit of text a large language model reads or writes. Not a word, not a character — usually a sub-word fragment. Every modern LLM API (Claude, ChatGPT, Gemini) bills per token of input plus per token of output, which is why the same prompt has predictable cost no matter the language or wording.

Rule of thumb: 1 token ≈ 4 characters of English, or about 0.75 words. So a typical 250-word email is roughly 330 tokens. Chinese, Japanese, code, and emoji tokenize differently.

Why tokens, not words?

Models read text after passing it through a tokenizer — a fixed vocabulary of common substrings (e.g., "ing", "tion", " the"). Splitting on substrings instead of words gives the model a constant-size vocabulary that handles new words, code, and languages it has never seen, by composing them from familiar fragments.

Example tokenization of "Tokenization is unsupervised":

["Token", "ization", " is", " un", "super", "vised"]   ← 6 tokens
("Tokenization is unsupervised" = 28 characters)

How input and output tokens differ

	Input tokens	Output tokens
What they are	Tokens you send (prompt + history)	Tokens the model generates
Typical price	1×	3–5× input
Why pricier	Just read	Generated one-by-one through full model

Output is more expensive because each token requires a full forward pass through the model. Long prompts are cheap, long answers are not.

Token counts for popular models (2026)

Model	Input / 1M	Output / 1M	Context limit
Claude Sonnet 4	$3	$15	200k
Claude Opus 4	$15	$75	200k
Claude Haiku 4	$0.80	$4	200k
GPT-4o	$2.50	$10	128k
GPT-4o mini	$0.15	$0.60	128k
Gemini 2.5 Pro	$1.25	$5	2M
Gemini 2.5 Flash	$0.075	$0.30	1M

Prices verified May 2026; check vendor pricing pages for current rates.

How to estimate cost before you call

Count characters in your prompt (system + user messages + history).
Divide by 4 → approximate input tokens.
Estimate expected response length (e.g., 300 words ≈ 400 output tokens).
Multiply by per-token price for your model.

Example: GPT-4o, 2,000-char prompt + 400 output tokens
(2000/4) × $0.0000025 + 400 × $0.00001 = $0.00125 + $0.004 = $0.00525 per call.

Counting exactly (no guessing)

For exact counts:

OpenAI: the tiktoken Python package (encoding_for_model("gpt-4o")) gives per-message counts.
Anthropic: the SDK exposes client.messages.count_tokens() (no model run, free).
Gemini: the SDK exposes client.models.count_tokens().

Every API response also returns actual usage.input_tokens / output_tokens — log them, sum daily, you have ground truth billing.

Token quirks to budget for

Conversation history grows linearly: each follow-up resends every prior turn. By turn 20, prompts are huge.
Code costs more: indentation and identifiers tokenize less efficiently than prose. Expect ~3 chars/token for source code.
Non-Latin scripts cost more per character: Chinese / Japanese / Korean often run 1.5–3 chars per token.
System prompts charge every call: a 2k-token system prompt costs you 2k tokens on every request — use prompt caching when available.

Save tokens (and money)

Trim system prompts. Specifics beat verbosity.
Cap max_tokens to the actual length you need.
Use prompt caching (Claude / Gemini) for the static parts of long prompts — caches cost 10–25% of normal input.
Pick the smallest model that solves the task. Sonnet for code, Haiku for triage.
Use a discount proxy: same tokens, lower per-token rate.

Same tokens, ~50% off

TokenProvider routes Claude / ChatGPT / Gemini through one key, billed per token at a discount.