Endpoints, parameters and response format mirror the OpenAI API. If your code already works with OpenAI — change base_url and the key, leave everything else alone.
https://api.cheapai.io/v1https://api.cheapai.io (the client appends /v1/messages itself)Authorization: Bearer cai-.... Grab the key in API keys.Model name — from the catalog: gpt-4o, claude-3-5-sonnet, gemini-1-5-pro, etc. The gateway routes the request to the right provider based on the model name.
| Method and path | Purpose |
|---|---|
POST /v1/chat/completions | Chat completions — the main endpoint for conversational models. Supports stream. |
POST /v1/embeddings | Text vector representations. |
GET /v1/models | List of available models and their identifiers. |
POST /v1/messages | Anthropic-compatible endpoint (used by Claude Code and the Anthropic SDK). |
Main body parameters:
| Field | Type | Description |
|---|---|---|
model | string | Required. Model name from the catalog. |
messages | array | Required. List of messages . |
stream | bool | If true — response arrives in chunks (SSE). Default false. |
temperature | number | Randomness, usually 0–2. Default — whatever the model uses. |
max_tokens | int | Cap on response length in tokens. |
top_p, stop, presence_penalty, frequency_penalty | — | Passed to the provider as-is, when supported. |
Request:
curl
curl https://api.cheapai.io/v1/chat/completions \
-H "Authorization: Bearer $CHEAPAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o",
"messages": [
{"role": "system", "content": "Answer briefly."},
{"role": "user", "content": "What is a token in an LLM?"}
]
}'Response:
json
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1730000000,
"model": "gpt-4o",
"choices": [
{
"index": 0,
"message": {"role": "assistant", "content": "The minimal unit of text..."},
"finish_reason": "stop"
}
],
"usage": {"prompt_tokens": 21, "completion_tokens": 14, "total_tokens": 35}
}Same via SDK:
python
from openai import OpenAI
client = OpenAI(base_url="https://api.cheapai.io/v1", api_key="cai-...")
resp = client.chat.completions.create(
model="claude-3-5-sonnet",
messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)node (openai sdk)
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.cheapai.io/v1",
apiKey: "cai-...",
});
const resp = await client.chat.completions.create({
model: "gpt-4o",
messages: [{ role: "user", content: "Hello!" }],
});
console.log(resp.choices[0].message.content);With "stream": true the response arrives as a stream of text/event-stream events: each event is a line data: with a chat.completion.chunk object, where choices[0].delta contains the next piece. The stream ends with a line data: [DONE].
event stream
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"Hel"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{"content":"lo"},"finish_reason":null}]}
data: {"id":"chatcmpl-abc123","object":"chat.completion.chunk","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]python · streaming
stream = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Count to five."}],
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta.content or ""
print(delta, end="", flush=True)curl
curl https://api.cheapai.io/v1/embeddings \
-H "Authorization: Bearer $CHEAPAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model": "text-embedding-3", "input": "sample text"}'Response — array data[].embedding with numbers and a usage field. input can be a string or an array of strings (batch).
For tools built around Anthropic (Claude Code, Anthropic SDK) there is a native /v1/messages endpoint. The base URL here is without /v1: the client adds the path itself.
terminal
export ANTHROPIC_BASE_URL="https://api.cheapai.io"
export ANTHROPIC_AUTH_TOKEN="cai-..."Details — Claude Code.
In a non-streaming response, the usage field contains prompt_tokens, completion_tokens and total_tokens. Combined with the model tariff, this gives the request cost; the final amount per request is in your usage history. How billing works — on the Pricing page.
Each model has a request-rate limit; on exceeding it you'll get 429 — retry with exponential backoff. Full list of codes and how to handle them — on the Error codes page.
Next: Error codes · Developer FAQ · Model catalog.