Gemini API Example (2026)

Google's Gemini 2.5 Pro and Flash are competitive on context length (1M+ tokens) and multimodal input (image, audio, video, PDF). Below: working calls in Python, Node.js, and curl, plus the OpenAI-compatible endpoint that lets existing OpenAI client code target Gemini with one line changed.

Two ways to call Gemini: the native generativelanguage.googleapis.com endpoint, or the OpenAI-compatible endpoint that mimics /v1/chat/completions. Pick OpenAI-compatible if you already have OpenAI client code.

1. Get an API key

Get a free key from Google AI Studio. Free tier is generous for prototyping; paid tier kicks in beyond rate limits. Save it as an env var:

export GEMINI_API_KEY="AIza…"

2. Python (native SDK)

pip install google-genai

from google import genai

client = genai.Client(api_key=GEMINI_API_KEY)

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Explain quantum entanglement in one paragraph.",
)
print(response.text)

3. Streaming

stream = client.models.generate_content_stream(
    model="gemini-2.5-flash",
    contents="Count from 1 to 10 with one number per line.",
)

for chunk in stream:
    print(chunk.text, end="", flush=True)

4. Multimodal: image + text

from PIL import Image

img = Image.open("./screenshot.png")

response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents=[img, "What does this UI mockup show?"],
)
print(response.text)

The same pattern accepts audio (.mp3), video, and PDF. For files larger than ~20 MB use the File API to upload first, then reference by URI.

5. Node.js / TypeScript

npm install @google/genai

import { GoogleGenAI } from "@google/genai";

const ai = new GoogleGenAI({ apiKey: process.env.GEMINI_API_KEY! });

const response = await ai.models.generateContent({
  model: "gemini-2.5-flash",
  contents: "Write a one-line haiku about TypeScript.",
});

console.log(response.text);

6. curl (raw HTTP)

curl -sS \
  "https://generativelanguage.googleapis.com/v1beta/models/gemini-2.5-flash:generateContent" \
  -H "x-goog-api-key: $GEMINI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "contents": [{
      "parts": [{"text": "Capital of Brazil?"}]
    }]
  }'

7. OpenAI-compatible endpoint (drop-in)

Gemini also exposes an OpenAI-shaped endpoint. Existing OpenAI Python / JS code works by changing two lines:

from openai import OpenAI

client = OpenAI(
    api_key=GEMINI_API_KEY,
    base_url="https://generativelanguage.googleapis.com/v1beta/openai/",
)

response = client.chat.completions.create(
    model="gemini-2.5-flash",
    messages=[{"role": "user", "content": "Hello in Portuguese."}],
)
print(response.choices[0].message.content)

8. Long context (1M tokens)

Gemini 2.5 Pro accepts up to 2M input tokens — large enough to drop an entire codebase or a multi-hour transcript in one prompt. Best practice is to chunk and use context caching for repeat queries (substantial cost savings).

Pricing snapshot (May 2026)

Model	Input / 1M	Output / 1M	Best for
gemini-2.5-flash	$0.075	$0.30	Cheap default
gemini-2.5-pro	$1.25	$5.00	Reasoning + long context

Verify current rates on Google AI Studio.

One key for all three vendors

TokenProvider forwards Gemini requests through the same key you use for Claude and ChatGPT. Switch model in the request body — no second account, no separate billing.

# OpenAI client → Gemini via TokenProvider
client = OpenAI(
    api_key="your_proxy_key",
    base_url="https://tokenprovider.store/v1",
)
client.chat.completions.create(model="gemini-2.5-flash", messages=...)

Claude + ChatGPT + Gemini in one key

Same SDK, change the model field. Pay per token, no monthly fee.