Does a smaller model for autocomplete actually save money?

Yes. Autocomplete is the highest-frequency call. Using a fast, cheap model for completions and reserving Sonnet or Opus for Composer edits removes the biggest source of token bloat.

How do I see exactly where my Cursor tokens go?

Use a metered endpoint with a per-request usage log. Each Cursor call shows the model, input/output tokens, and cost, so you can see whether autocomplete, Composer, or apply is driving spend.

Reduce Cursor Token Usage

Q: Why does Cursor use so many tokens?

Cursor resends context on every turn, autocomplete fires many small requests, and each apply is a second model call. Large @-mentions and a big indexed codebase inflate the input tokens sent with each request.

If Cursor is burning through Claude tokens faster than you expect, it's almost always context bloat — not the model being expensive. Here's where the tokens actually go and the settings that cut spend without slowing you down.

Biggest wins: add a .cursorignore, stop @-mentioning whole folders, use a fast cheap model for autocomplete (reserve Sonnet/Opus for Composer), and start fresh chats often. Then watch a per-request usage log to confirm what's actually costing you.

Where Cursor's tokens go

Source	Why it adds up
Context resend	Every Composer turn resends the conversation + attached files as input tokens
Autocomplete (Tab)	Highest-frequency call — many tiny requests across a day
Apply diff	A second model call to apply each suggested edit
Large @-mentions	@-folder or @-codebase can attach far more than you need
Indexed codebase	Big/generated files inflate retrieved context

Six fixes, biggest first

Add a .cursorignore

Exclude generated and heavy directories so they never enter context or the index:
```
# .cursorignore
node_modules/
dist/
build/
.next/
*.lock
*.min.js
coverage/
```
Be surgical with @-mentions

Attach the specific files you're working on, not @codebase or a whole @folder. The model rarely needs the entire tree to edit one module.
Split autocomplete from Composer

Use a fast, cheap model for Tab completions and keep Sonnet/Opus for Composer edits. Autocomplete frequency is what quietly dominates a day's spend.
Start fresh chats often

A long Composer thread resends its entire history every turn. When you switch tasks, open a new chat so you're not paying to resend stale context.
Right-size the model per task

Use Haiku/GPT-4o-mini for lookups and explanations, Sonnet for normal edits, Opus only for hard multi-file refactors. Most edits don't need the biggest model.
Measure, then trim

Route Cursor through a metered endpoint and read the per-request log: model, tokens, cost. Once you can see whether autocomplete, Composer, or apply is the culprit, the fix is obvious. Setup: Cursor + Claude API proxy.

Bloat that also causes 429s

The same oversized context that runs up tokens also pushes you into per-minute input-token limits. If you're seeing rate-limit errors alongside high spend, see fixing Anthropic API 429s.

FAQ

Why does Cursor use so many tokens?

It resends context every turn, autocomplete fires constantly, and each apply is a second call. Large @-mentions and a big indexed codebase inflate every request.

Will a smaller autocomplete model hurt quality?

Rarely — completions are short and local. The quality that matters lives in Composer edits, which you keep on Sonnet or Opus.

How do I see where my tokens go?

Use a metered endpoint with a per-request usage log so each Cursor call shows model, tokens, and cost — then trim the biggest line item.

See every Cursor request — model, tokens, cost

$1 minimum top-up, pay per token, cancel anytime.

Reduce Cursor Token Usage

Where Cursor's tokens go

Six fixes, biggest first

Add a .cursorignore

Be surgical with @-mentions

Split autocomplete from Composer

Start fresh chats often

Right-size the model per task

Measure, then trim

Bloat that also causes 429s

FAQ

Why does Cursor use so many tokens?

Will a smaller autocomplete model hurt quality?

How do I see where my tokens go?

See every Cursor request — model, tokens, cost

Related guides

Add a `.cursorignore`