Claude API Python Example (2026)

How do you call the Claude API from Python? Below are copy-paste-ready examples: two SDKs, streaming, function calling, multi-turn chat, and proper error handling. Default base_url points at TokenProvider, which costs 50%+ less than direct Anthropic.

Three-line version: install openai, set base_url to tokenprovider.store, use model="claude-sonnet-4". Every snippet below runs as-is.

Prerequisites

  • Python 3.8+
  • A TokenProvider API key (free signup, $1 minimum top-up)
  • Pick an SDK: openai (universal) or anthropic (official)
pip install openai       # Option A: OpenAI SDK (also calls GPT, Gemini)
pip install anthropic    # Option B: Anthropic official SDK

Option A: OpenAI SDK (recommended)

One SDK for every model — swap by changing model:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_TOKEN_HOME_KEY",
    base_url="https://tokenprovider.store/v1",
)

resp = client.chat.completions.create(
    model="claude-sonnet-4",   # or claude-opus, claude-haiku, gpt-4o, gemini-2-flash
    messages=[
        {"role": "system", "content": "You are a senior Python engineer."},
        {"role": "user", "content": "Write an efficient typed quicksort."},
    ],
    max_tokens=800,
)

print(resp.choices[0].message.content)
print(f"Used {resp.usage.total_tokens} tokens")

Gotchas:

  • base_url ends in /v1 — required by the OpenAI protocol
  • Use the Anthropic model name directly (no anthropic/ prefix)
  • resp.usage has the billable token counts

Option B: Anthropic official SDK

Claude-only with Anthropic's native Messages format:

import anthropic

client = anthropic.Anthropic(
    api_key="YOUR_TOKEN_HOME_KEY",
    base_url="https://tokenprovider.store",   # NOTE: no /v1 suffix
)

msg = client.messages.create(
    model="claude-sonnet-4",
    max_tokens=800,
    system="You are a senior Python engineer.",
    messages=[
        {"role": "user", "content": "Write an efficient typed quicksort."},
    ],
)

print(msg.content[0].text)
print(f"Input {msg.usage.input_tokens}  Output {msg.usage.output_tokens}")

Watch the base_url: OpenAI SDK wants /v1, Anthropic SDK doesn't. Mixing them up gets you 404.

Streaming (token-by-token)

For chat UIs and CLIs:

from openai import OpenAI

client = OpenAI(
    api_key="YOUR_KEY",
    base_url="https://tokenprovider.store/v1",
)

stream = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "Explain Python asyncio in 3 paragraphs."}],
    stream=True,
)

for chunk in stream:
    if chunk.choices[0].delta.content:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()

In streaming mode, usage typically arrives in the final chunk.

Function calling (tools)

Ask Claude for a structured function call:

import json
from openai import OpenAI

client = OpenAI(api_key="YOUR_KEY", base_url="https://tokenprovider.store/v1")

tools = [{
    "type": "function",
    "function": {
        "name": "get_weather",
        "description": "Look up current weather for a city",
        "parameters": {
            "type": "object",
            "properties": {
                "city": {"type": "string", "description": "City name"}
            },
            "required": ["city"],
        },
    },
}]

resp = client.chat.completions.create(
    model="claude-sonnet-4",
    messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
    tools=tools,
)

tool_call = resp.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
print(f"Claude wants to call {tool_call.function.name}({args})")

Feed the actual weather back as a tool-role message; Claude integrates it into the final answer.

Multi-turn conversation

history = [
    {"role": "system", "content": "You are a concise Python tutor."},
]

def ask(question: str) -> str:
    history.append({"role": "user", "content": question})
    resp = client.chat.completions.create(
        model="claude-sonnet-4",
        messages=history,
    )
    answer = resp.choices[0].message.content
    history.append({"role": "assistant", "content": answer})
    return answer

print(ask("Difference between await and yield from?"))
print(ask("What about asyncio.gather vs wait?"))   # second turn carries context

Error handling

from openai import OpenAI, RateLimitError, AuthenticationError, APIError

client = OpenAI(api_key="YOUR_KEY", base_url="https://tokenprovider.store/v1")

try:
    resp = client.chat.completions.create(
        model="claude-sonnet-4",
        messages=[{"role": "user", "content": "hello"}],
        timeout=30,
    )
except AuthenticationError:
    print("Bad key — check API key validity")
except RateLimitError:
    print("Rate limit — retry or contact support for a raise")
except APIError as e:
    print(f"API error: {e.status_code} {e.message}")

Saving money

  1. Use Haiku for bulk tasks: classification, summaries, and tagging run fine on Haiku 3.5 at ~1/10 the Sonnet price
  2. Enable prompt caching: repeated long system prompts cache to ~10% input cost on hit
  3. Cap max_tokens: output is billed per token — set a realistic ceiling
  4. Sticky sessions: consecutive calls from the same key reuse upstream account cache

Grab a key and run the code above

Signup comes with trial credit. No monthly fee. Pay only for tokens you use.

Sign up free → Already a member

FAQ

Which SDK should I use?

If your project also calls GPT or Gemini, use openai — one client, all models. If you're Claude-only and need native features (vision, document parts), use anthropic.

Does Claude Vision work for image inputs?

Yes. OpenAI SDK uses {"type": "image_url"}; Anthropic SDK uses {"type": "image", "source": {...}}.

How do I control concurrency?

Use httpx.AsyncClient or asyncio.Semaphore. TokenProvider has generous per-key limits but burst-heavy callers will still see 429.

Can I call Claude Code from Python?

Claude Code is a CLI, not a direct API. See the Claude Code setup guide.