Rate-Limited or Blocked? A Multi-Model Failover Plan

## The risk of one provider
If your whole product depends on a single model endpoint, a rate limit, an outage, or a regional block takes your app down with it. The fix is not heroic — it is a fallback list.
## The pattern: ordered fallback
Try a primary model; on a rate-limit or error, fall back to the next one. Because ClaudeN AI exposes Claude, GPT and Gemini through one key, failover is just a list of model names.
``python``
from openai import OpenAI
client = OpenAI(api_key="YOUR_CLAUDEN_KEY", base_url="https://clauden.ai/v1")
FALLBACKS = ["claude-3-5-sonnet", "gpt-4o", "gemini-1.5-pro"]
def ask(prompt: str) -> str:
last_err = None
for model in FALLBACKS:
try:
r = client.chat.completions.create(
model=model,
messages=[{"role": "user", "content": prompt}],
)
return r.choices[0].message.content
except Exception as e:
last_err = e
continue
raise RuntimeError(f"All models failed: {last_err}")
print(ask("Summarize the CAP theorem in two sentences."))
## Tips that make failover reliable
- Order by capability, then cost: best model first, cheapest fallback last.
- Add a short timeout so a slow provider does not block the whole chain.
- Log which model actually served each request so you can tune the order.
## Why a relay helps
Doing this against three separate providers means three keys, three SDKs, and three billing systems. Through one gateway it is a single list and a single balance — resilience without the operational tax.