Rate-Limited or Blocked? A Multi-Model Failover Plan

## The risk of one provider

If your whole product depends on a single model endpoint, a rate limit, an outage, or a regional block takes your app down with it. The fix is not heroic — it is a fallback list.

## The pattern: ordered fallback

Try a primary model; on a rate-limit or error, fall back to the next one. Because ClaudeN AI exposes Claude, GPT and Gemini through one key, failover is just a list of model names.

``python from openai import OpenAI client = OpenAI(api_key="YOUR_CLAUDEN_KEY", base_url="https://clauden.ai/v1") FALLBACKS = ["claude-3-5-sonnet", "gpt-4o", "gemini-1.5-pro"] def ask(prompt: str) -> str: last_err = None for model in FALLBACKS: try: r = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], ) return r.choices[0].message.content except Exception as e: last_err = e continue raise RuntimeError(f"All models failed: {last_err}") print(ask("Summarize the CAP theorem in two sentences."))``

## Tips that make failover reliable

- Order by capability, then cost: best model first, cheapest fallback last.
- Add a short timeout so a slow provider does not block the whole chain.
- Log which model actually served each request so you can tune the order.

## Why a relay helps

Doing this against three separate providers means three keys, three SDKs, and three billing systems. Through one gateway it is a single list and a single balance — resilience without the operational tax.

Related posts

Connect Cursor, Cline and LangChain to ClaudeN AI

Build an AI Chatbot in Python in 30 Minutes

Use the OpenAI SDK with a Third-Party Gateway: Just Change base_url

Sign up and get $5 free credit