Skip to main content
To stay stable, TokenDog rate-limits request frequency. When you exceed a limit you get HTTP 429 Too Many Requests.

Dimensions

Limits are typically metered per API key and model group. Exact quotas and windows depend on your plan in the console.

Handling 429

On 429, back off and retry with exponential backoff plus jitter to avoid thundering herds:
import time, random
from openai import OpenAI, RateLimitError

client = OpenAI(base_url="https://tokendog.io/v1", api_key="YOUR_TOKENDOG_API_KEY")

def chat_with_retry(**kwargs):
    for attempt in range(5):
        try:
            return client.chat.completions.create(**kwargs)
        except RateLimitError:
            time.sleep((2 ** attempt) + random.random())
    raise RuntimeError("rate limited after retries")
For batch jobs, cap concurrency and queue on your side — smoothing the request curve beats retrying after the fact.