Claude API Python Example (2026)
How do you call the Claude API from Python? Below are copy-paste-ready examples:
two SDKs, streaming, function calling, multi-turn chat, and proper error handling.
Default base_url points at TokenProvider, which
costs 50%+ less than direct Anthropic.
Three-line version: install openai, set
base_url to tokenprovider.store, use model="claude-sonnet-4".
Every snippet below runs as-is.
Prerequisites
- Python 3.8+
- A TokenProvider API key (free signup, $1 minimum top-up)
- Pick an SDK:
openai(universal) oranthropic(official)
pip install openai # Option A: OpenAI SDK (also calls GPT, Gemini)
pip install anthropic # Option B: Anthropic official SDK
Option A: OpenAI SDK (recommended)
One SDK for every model — swap by changing model:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_TOKEN_HOME_KEY",
base_url="https://tokenprovider.store/v1",
)
resp = client.chat.completions.create(
model="claude-sonnet-4", # or claude-opus, claude-haiku, gpt-4o, gemini-2-flash
messages=[
{"role": "system", "content": "You are a senior Python engineer."},
{"role": "user", "content": "Write an efficient typed quicksort."},
],
max_tokens=800,
)
print(resp.choices[0].message.content)
print(f"Used {resp.usage.total_tokens} tokens")
Gotchas:
base_urlends in/v1— required by the OpenAI protocol- Use the Anthropic model name directly (no
anthropic/prefix) resp.usagehas the billable token counts
Option B: Anthropic official SDK
Claude-only with Anthropic's native Messages format:
import anthropic
client = anthropic.Anthropic(
api_key="YOUR_TOKEN_HOME_KEY",
base_url="https://tokenprovider.store", # NOTE: no /v1 suffix
)
msg = client.messages.create(
model="claude-sonnet-4",
max_tokens=800,
system="You are a senior Python engineer.",
messages=[
{"role": "user", "content": "Write an efficient typed quicksort."},
],
)
print(msg.content[0].text)
print(f"Input {msg.usage.input_tokens} Output {msg.usage.output_tokens}")
Watch the base_url: OpenAI SDK wants /v1,
Anthropic SDK doesn't. Mixing them up gets you 404.
Streaming (token-by-token)
For chat UIs and CLIs:
from openai import OpenAI
client = OpenAI(
api_key="YOUR_KEY",
base_url="https://tokenprovider.store/v1",
)
stream = client.chat.completions.create(
model="claude-sonnet-4",
messages=[{"role": "user", "content": "Explain Python asyncio in 3 paragraphs."}],
stream=True,
)
for chunk in stream:
if chunk.choices[0].delta.content:
print(chunk.choices[0].delta.content, end="", flush=True)
print()
In streaming mode, usage typically arrives in the final chunk.
Function calling (tools)
Ask Claude for a structured function call:
import json
from openai import OpenAI
client = OpenAI(api_key="YOUR_KEY", base_url="https://tokenprovider.store/v1")
tools = [{
"type": "function",
"function": {
"name": "get_weather",
"description": "Look up current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string", "description": "City name"}
},
"required": ["city"],
},
},
}]
resp = client.chat.completions.create(
model="claude-sonnet-4",
messages=[{"role": "user", "content": "What's the weather in Tokyo?"}],
tools=tools,
)
tool_call = resp.choices[0].message.tool_calls[0]
args = json.loads(tool_call.function.arguments)
print(f"Claude wants to call {tool_call.function.name}({args})")
Feed the actual weather back as a tool-role message; Claude integrates it into the final answer.
Multi-turn conversation
history = [
{"role": "system", "content": "You are a concise Python tutor."},
]
def ask(question: str) -> str:
history.append({"role": "user", "content": question})
resp = client.chat.completions.create(
model="claude-sonnet-4",
messages=history,
)
answer = resp.choices[0].message.content
history.append({"role": "assistant", "content": answer})
return answer
print(ask("Difference between await and yield from?"))
print(ask("What about asyncio.gather vs wait?")) # second turn carries context
Error handling
from openai import OpenAI, RateLimitError, AuthenticationError, APIError
client = OpenAI(api_key="YOUR_KEY", base_url="https://tokenprovider.store/v1")
try:
resp = client.chat.completions.create(
model="claude-sonnet-4",
messages=[{"role": "user", "content": "hello"}],
timeout=30,
)
except AuthenticationError:
print("Bad key — check API key validity")
except RateLimitError:
print("Rate limit — retry or contact support for a raise")
except APIError as e:
print(f"API error: {e.status_code} {e.message}")
Saving money
- Use Haiku for bulk tasks: classification, summaries, and tagging run fine on Haiku 3.5 at ~1/10 the Sonnet price
- Enable prompt caching: repeated long system prompts cache to ~10% input cost on hit
- Cap max_tokens: output is billed per token — set a realistic ceiling
- Sticky sessions: consecutive calls from the same key reuse upstream account cache
Grab a key and run the code above
Signup comes with trial credit. No monthly fee. Pay only for tokens you use.
Sign up free → Already a memberFAQ
Which SDK should I use?
If your project also calls GPT or Gemini, use openai — one client, all models. If you're Claude-only and need native features (vision, document parts), use anthropic.
Does Claude Vision work for image inputs?
Yes. OpenAI SDK uses {"type": "image_url"}; Anthropic SDK uses {"type": "image", "source": {...}}.
How do I control concurrency?
Use httpx.AsyncClient or asyncio.Semaphore. TokenProvider has generous per-key limits but burst-heavy callers will still see 429.
Can I call Claude Code from Python?
Claude Code is a CLI, not a direct API. See the Claude Code setup guide.