Roo Code Custom LLM Setup
Roo Code is an autonomous coding agent for VS Code, and it lets you bring any OpenAI-compatible or Anthropic endpoint. Point it at a single gateway and you get Claude, Claude Code, ChatGPT, and Gemini behind one base URL and one key — with pay-as-you-go, per-request billing instead of four separate plans. Setup time: 3 minutes.
The fast path: Roo Code → settings (gear) → API Provider =
OpenAI Compatible → Base URL https://tokenprovider.store/v1,
paste your key, set a model ID like claude-sonnet-4-20250514. Done.
Two ways to connect Roo Code
Roo Code exposes several API providers. Two are relevant here:
- OpenAI Compatible — one endpoint for every model. The gateway maps the model ID you enter to the right vendor, so Claude, GPT, and Gemini all work from a single configuration.
- Anthropic — native Anthropic protocol with a custom base URL. Cleaner for Claude-only workflows, but limited to Claude models.
Most people want Option A — it covers every model and every Roo Code mode from one slot.
Option A: OpenAI Compatible provider (recommended)
-
Open Roo Code settings
Click the Roo Code icon in the VS Code Activity Bar, then the gear icon → API Configuration.
-
Set API Provider to "OpenAI Compatible"
In the API Provider dropdown, choose OpenAI Compatible.
-
Enter the base URL and key
Base URL:
https://tokenprovider.store/v1
API Key: paste the key from your Keys page.Keep the
/v1suffix for the OpenAI-Compatible provider. The native Anthropic provider (Option B) and Claude Code use the base URL without /v1. -
Set a model ID
Type a full model ID in the Model field, e.g.:
claude-sonnet-4-20250514— day-to-day codingclaude-opus-4-20250514— hard refactors / planninggpt-4o— quick questions
Roo Code passes the model ID straight through; the gateway routes it to the right vendor.
-
Save and verify
Save the config, open the Roo Code chat, and run a small task like "explain this file." A normal response means it is routing correctly — and the call appears on your Usage page with tokens billed and a request ID.
Option B: Anthropic provider with custom base URL
If you only use Claude and prefer the native protocol:
- API Provider → Anthropic.
- Paste your key, tick Use custom base URL, and enter
https://tokenprovider.store(no/v1). - Select a Claude model. Roo Code now sends native
messagesrequests.
Assign cheaper models per mode
Roo Code's biggest cost lever is its modes. You don't need Opus for everything — assign a model per mode and your bill drops without losing quality where it matters:
| Mode | Job | Sensible model |
|---|---|---|
| Code | Writing & editing files | Sonnet — strong default |
| Architect | Planning multi-file changes | Opus for big plans, Sonnet otherwise |
| Ask | Quick Q&A, no edits | GPT-4o / Haiku — cheap & fast |
| Debug | Tracing failures | Sonnet |
| Orchestrator | Delegating subtasks | Sonnet, with cheaper models on the children |
Create one API config profile per model in Roo Code, then bind a profile to each mode.
Common errors
| Error | Cause | Fix |
|---|---|---|
| Connection / 404 | Missing /v1 on the OpenAI-Compatible provider | Use exactly https://tokenprovider.store/v1 |
| "Model not found" | Model ID typo | Use a full versioned ID like claude-sonnet-4-20250514 |
| 401 | Wrong or revoked key | Regenerate on the Keys page |
| 429 | Burst over the current rate limit | Retry; smart routing spreads load and reduces 429s vs a single key |
FAQ
Is Roo Code the same as Cline?
They share lineage but are separate extensions with different mode systems. Both accept a custom OpenAI-compatible or Anthropic endpoint, so the same gateway works for either — see the Cline guide for that setup.
Can I use Claude and GPT in the same project?
Yes. With the OpenAI-Compatible provider you just change the model ID — one key, one base URL, every model.
How do I keep costs down on agent runs?
Assign cheaper models to Ask/Debug modes, keep Opus for Architect, and watch per-request spend on the Usage page. Agent loops make many calls, so per-mode model choice matters more than in a normal chat.
Latency hit?
+20–80 ms over a direct call. For agentic edits, model think-time dominates, so the gateway hop is not what you'll feel.
One endpoint for every Roo Code mode
$1 minimum top-up, pay per token, cancel anytime.
Sign up free → Already a member