OpenAI-compatible API

Endpoints, parameters and response format mirror the OpenAI API. If your code already works with OpenAI — change base_url and the key, leave everything else alone.

Base URL and auth

OpenAI-compatible base URL: https://api.cheapai.io/v1
Anthropic-compatible base URL: https://api.cheapai.io (the client appends /v1/messages itself)
Auth: header Authorization: Bearer cai-.... Grab the key in API keys.

Model name — from the catalog: gpt-4o, claude-3-5-sonnet, gemini-1-5-pro, etc. The gateway routes the request to the right provider based on the model name.

Endpoints

Method and path	Purpose
`POST /v1/chat/completions`	Chat completions — the main endpoint for conversational models. Supports `stream`.
`POST /v1/embeddings`	Text vector representations.
`GET /v1/models`	List of available models and their identifiers.
`POST /v1/messages`	Anthropic-compatible endpoint (used by Claude Code and the Anthropic SDK).

POST /v1/chat/completions

Main body parameters:

Field	Type	Description
`model`	string	Required. Model name from the catalog.
`messages`	array	Required. List of messages .
`stream`	bool	If `true` — response arrives in chunks (SSE). Default `false`.
`temperature`	number	Randomness, usually 0–2. Default — whatever the model uses.
`max_tokens`	int	Cap on response length in tokens.
`top_p`, `stop`, `presence_penalty`, `frequency_penalty`	—	Passed to the provider as-is, when supported.

Request:

curl

curl https://api.cheapai.io/v1/chat/completions \
  -H "Authorization: Bearer $CHEAPAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      {"role": "system", "content": "Answer briefly."},
      {"role": "user", "content": "What is a token in an LLM?"}
    ]
  }'

OpenAI-compatible API

Base URL and auth

Endpoints

POST /v1/chat/completions

Streaming

POST /v1/embeddings

Anthropic-compatible endpoint

Token accounting

Rate limits and errors