Roo Code Custom LLM Setup

Roo Code is an autonomous coding agent for VS Code, and it lets you bring any OpenAI-compatible or Anthropic endpoint. Point it at a single gateway and you get Claude, Claude Code, ChatGPT, and Gemini behind one base URL and one key — with pay-as-you-go, per-request billing instead of four separate plans. Setup time: 3 minutes.

The fast path: Roo Code → settings (gear) → API Provider = OpenAI Compatible → Base URL https://tokenprovider.store/v1, paste your key, set a model ID like claude-sonnet-4-20250514. Done.

Two ways to connect Roo Code

Roo Code exposes several API providers. Two are relevant here:

  1. OpenAI Compatible — one endpoint for every model. The gateway maps the model ID you enter to the right vendor, so Claude, GPT, and Gemini all work from a single configuration.
  2. Anthropic — native Anthropic protocol with a custom base URL. Cleaner for Claude-only workflows, but limited to Claude models.

Most people want Option A — it covers every model and every Roo Code mode from one slot.

Option A: OpenAI Compatible provider (recommended)

  1. Open Roo Code settings

    Click the Roo Code icon in the VS Code Activity Bar, then the gear icon → API Configuration.

  2. Set API Provider to "OpenAI Compatible"

    In the API Provider dropdown, choose OpenAI Compatible.

  3. Enter the base URL and key

    Base URL: https://tokenprovider.store/v1
    API Key: paste the key from your Keys page.

    Keep the /v1 suffix for the OpenAI-Compatible provider. The native Anthropic provider (Option B) and Claude Code use the base URL without /v1.

  4. Set a model ID

    Type a full model ID in the Model field, e.g.:

    • claude-sonnet-4-20250514 — day-to-day coding
    • claude-opus-4-20250514 — hard refactors / planning
    • gpt-4o — quick questions

    Roo Code passes the model ID straight through; the gateway routes it to the right vendor.

  5. Save and verify

    Save the config, open the Roo Code chat, and run a small task like "explain this file." A normal response means it is routing correctly — and the call appears on your Usage page with tokens billed and a request ID.

Option B: Anthropic provider with custom base URL

If you only use Claude and prefer the native protocol:

  1. API Provider → Anthropic.
  2. Paste your key, tick Use custom base URL, and enter https://tokenprovider.store (no /v1).
  3. Select a Claude model. Roo Code now sends native messages requests.

Assign cheaper models per mode

Roo Code's biggest cost lever is its modes. You don't need Opus for everything — assign a model per mode and your bill drops without losing quality where it matters:

ModeJobSensible model
CodeWriting & editing filesSonnet — strong default
ArchitectPlanning multi-file changesOpus for big plans, Sonnet otherwise
AskQuick Q&A, no editsGPT-4o / Haiku — cheap & fast
DebugTracing failuresSonnet
OrchestratorDelegating subtasksSonnet, with cheaper models on the children

Create one API config profile per model in Roo Code, then bind a profile to each mode.

Common errors

ErrorCauseFix
Connection / 404Missing /v1 on the OpenAI-Compatible providerUse exactly https://tokenprovider.store/v1
"Model not found"Model ID typoUse a full versioned ID like claude-sonnet-4-20250514
401Wrong or revoked keyRegenerate on the Keys page
429Burst over the current rate limitRetry; smart routing spreads load and reduces 429s vs a single key

FAQ

Is Roo Code the same as Cline?

They share lineage but are separate extensions with different mode systems. Both accept a custom OpenAI-compatible or Anthropic endpoint, so the same gateway works for either — see the Cline guide for that setup.

Can I use Claude and GPT in the same project?

Yes. With the OpenAI-Compatible provider you just change the model ID — one key, one base URL, every model.

How do I keep costs down on agent runs?

Assign cheaper models to Ask/Debug modes, keep Opus for Architect, and watch per-request spend on the Usage page. Agent loops make many calls, so per-mode model choice matters more than in a normal chat.

Latency hit?

+20–80 ms over a direct call. For agentic edits, model think-time dominates, so the gateway hop is not what you'll feel.

One endpoint for every Roo Code mode

$1 minimum top-up, pay per token, cancel anytime.

Sign up free → Already a member