Rate-Limited or Blocked? A Multi-Model Failover Plan

Rate-Limited or Blocked? A Multi-Model Failover Plan

## The risk of one provider

If your whole product depends on a single model endpoint, a rate limit, an outage, or a regional block takes your app down with it. The fix is not heroic — it is a fallback list.

## The pattern: ordered fallback

Try a primary model; on a rate-limit or error, fall back to the next one. Because ClaudeN AI exposes Claude, GPT and Gemini through one key, failover is just a list of model names.

``python
from openai import OpenAI

client = OpenAI(api_key="YOUR_CLAUDEN_KEY", base_url="https://clauden.ai/v1")

FALLBACKS = ["claude-3-5-sonnet", "gpt-4o", "gemini-1.5-pro"]

def ask(prompt: str) -> str:
last_err = None
for model in FALLBACKS:
try:
r = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
)
return r.choices[0].message.content
except Exception as e:
last_err = e
continue
raise RuntimeError(f"All models failed: {last_err}")

print(ask("Summarize the CAP theorem in two sentences."))
``

## Tips that make failover reliable

- Order by capability, then cost: best model first, cheapest fallback last.
- Add a short timeout so a slow provider does not block the whole chain.
- Log which model actually served each request so you can tune the order.

## Why a relay helps

Doing this against three separate providers means three keys, three SDKs, and three billing systems. Through one gateway it is a single list and a single balance — resilience without the operational tax.

Sign up and get $5 free credit

Start free

← ClaudeN AI Blog

Sign up and get $5 free credit